So i am trying to find pixels which are not white and create a bounding box around the image by examining the colors. I want to get the topmost, bottom most, leftmost and rightmost non white pixels, and use them to create a bounding box. I have used four loops to travel through each sides.also What i want to do is remove the background color (the background color is mostly grey) and change it to pure white. I have implemented all the functionality but now since i am using a lot of loops the code runs too slow. I need to optimize the loops while still having the functionality of finding the topmost, bottom most, leftmost and rightmost non white pixels and removing the colors. How can i do it?
The code below shows what i am doing to get the bounding box along with background removal at the same time. The mask is a black and white version of the image. If it is mask[i][j]==0 then it is a different color and hence i need to take the value and compare it with the values stored at p. It helps me find the bounding box. And if the mask[i][j]!=0 then i am changing the values of the image to white.
//for bounding box
p = []
p.append(5000)
p.append(0)
p.append(5000)
p.append(0)
for i in range(0, height):
for j in range(0, width):
if mask[i][j] == 0:
if j < p[0]:
p[0] = j
break
else:
img[i, j] = [255, 255, 255]
for i in range(0, height):
for j in reversed(range(0, width)):
if mask[i][j] == 0:
if j > p[1]:
p[1] = j
break
else:
img[i, j] = [255, 255, 255]
//topdown
for i in range(0, width):
for j in range(0, height):
if mask[j][i] == 0:
if j < p[2]:
p[2] = j
break
else:
img[j, i] = [255, 255, 255]
for i in reversed(range(0, width)):
for j in reversed(range(0, height)):
if mask[j][i] == 0:
if j > p[3]:
p[3] = j
break
else:
img[j, i] = [255, 255, 255]
So how can i optimize these loops while still getting the same functionality of getting pixel values and being able to change the color of some other image?
Background
To make the background white you can use a bitwise operation with the mask. To automate the creation of a mask read here.
Example:
import cv2
import numpy as np
# load image and mask
img = cv2.imread('image.png')
mask = cv2.imread('mask.png')
# combine images
res = cv2.bitwise_or(img,mask)
cv2.imshow("result", res)
cv2.waitKey(0)
cv2.destroyAllWindows()
The mask needs to have the equal number of color-channels as the image. All white areas in the mask will also become white in the image. Black areas in the mask will remain unaffected in the image.
Boundingbox
To get the boundingbox you could use findContours. It takes a binary mask as input and returns a list of contours. You can use the contour to find the boundingbox, rotated boundingbox or minimum enclosing circle. The result may not be perfect depending on your input, but you can use it to increase performance as it greatly narrows the search.
Note: the input to findContours should have a black background. You can modify your mask using inverted_mask = cv2.bitwise_not(mask). Or, if you obtained you mask using thresholding, you can choose an inverted threshold type.
Result:
Code:
import cv2
import numpy as np
# load image // use your mask instead
mask = cv2.imread('mask.png',0)
# find contours
contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
# get the boundingrect and draw a red line over it
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(mask2,(x,y),(x+w,y+h),(0,0,255),3)
# get the minumum enclosing rectangle and draw it in blue
rect = cv2.minAreaRect(cnt)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(mask2,[box],0,(255,0,0),3)
# diplay result
cv2.imshow("img", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()
Maths
If you'd rather stick to checking array values, you can boost performance by first summing the rows and the columns. Summing is fast (and baked into numpy) and now you can discard rows/columns by checking one value. You can see an example of this process in this answer. I would suggest using the mask with background black for this, as you can compare the sum with zero. This will essentially result in the red boundingbox above. Of course, when a non zero row/col is found you will still have to loop that one to find the exact coordinate.
Pfiew, that turned out much longer than intended...
#JohnColeman has a point. Nested Python loops will be relatively slow even with the best algorithms and there are libraries that can optimize such operations.
The algorithm itself could be sped up by using the results of each loop to limit the range of following loop. For example, if when looking for the top non-white pixel, you scanned from top to bottom as the outer loop, and left to right as the inner loop, and found a pixel (a, b) (with a is the distance from the top), then in the next section you went looking for the left pixel, you know that you can start scanning from a+1 in the top down outer loop, and no further than b - 1 in the left-right inner loop. Let's call the result (c, d).
Similarly the bottom pixel can be no less than c in the vertical and d in the horizontal.
Related
Hi I need to write a program that remove demarcation from gray scale image(image with text in it)
i read about thresholding and blurring but still i dont see how can i do it.
my image is an image of a hebrew text like that:
and i need to remove the demarcation(assuming that the demarcation is the smallest element in the image) the output need to be something like that
I want to write the code in python using opencv, what topics do i need to learn to be able to do that, and how?
thank you.
Edit:
I can use only cv2 functions
The symbols you want to remove are significantly smaller than all other shapes, you can use that to determine witch ones to remove.
First use threshold to convert the image to binary. Next, you can use findContours to detect the shapes and then contourArea to determine if the shape is larger than a threshold.
Finally you can can create a mask to remove the unwanted shapes, draw the larger symbols on a new image or draw the smaller symbols in white over the original symbols in the original image - making them disappear. I used that last technique in the code below.
Result:
Code:
import cv2
# load image as grayscale
img = cv2.imread('1MioS.png',0)
# convert to binary. Inverted, so you get white symbols on black background
_ , thres = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY_INV)
# find contours in the thresholded image (this gives all symbols)
contours, hierarchy = cv2.findContours(thres, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# loop through the contours, if the size of the contour is below a threshold,
# draw a white shape over it in the input image
for cnt in contours:
if cv2.contourArea(cnt) < 250:
cv2.drawContours(img,[cnt],0,(255),-1)
# display result
cv2.imshow('res', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Update
To find the largest contour, you can loop through them and keep track of the largest value:
maxArea = 0
for cnt in contours:
currArea = cv2.contourArea(cnt)
if currArea > maxArea:
maxArea = currArea
print(maxArea)
I also whipped up a little more complex version, that creates a sorted list of the indexes and sizes of the contours. Then it looks for the largest relative difference in size of all contours, so you know which contours are 'small' and 'large'. I do not know if this works for all letters / fonts.
# create a list of the indexes of the contours and their sizes
contour_sizes = []
for index,cnt in enumerate(contours):
contour_sizes.append([index,cv2.contourArea(cnt)])
# sort the list based on the contour size.
# this changes the order of the elements in the list
contour_sizes.sort(key=lambda x:x[1])
# loop through the list and determine the largest relative distance
indexOfMaxDifference = 0
currentMaxDifference = 0
for i in range(1,len(contour_sizes)):
sizeDifference = contour_sizes[i][1] / contour_sizes[i-1][1]
if sizeDifference > currentMaxDifference:
currentMaxDifference = sizeDifference
indexOfMaxDifference = i
# loop through the list again, ending (or starting) at the indexOfMaxDifference, to draw the contour
for i in range(0, indexOfMaxDifference):
cv2.drawContours(img,contours,contour_sizes[i][0] ,(255),-1)
To get the background color you can do use minMaxLoc. This returns the lowest color value and it's position of an image (also the max value, but you don't need that). If you apply it to the thresholded image - where the background is black -, it will return the location of a background pixel (big odds it will be (0,0) ). You can then look up this pixel in the original color image.
# get the location of a pixel with background color
min_val, _, min_loc, _ = cv2.minMaxLoc(thres)
# load color image
img_color = cv2.imread('1MioS.png')
# get bgr values of background
b,g,r = img_color[min_loc]
# convert from numpy object
background_color = (int(b),int(g),int(r))
and then to draw the contours
cv2.drawContours(img_color,contours,contour_sizes[i][0],background_color,-1)
and of course
cv2.imshow('res', img_color)
This looks like a problem for template matching since you have what looks like a known font and can easily understand what the characters and/or demarcations are. Check out https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html
Admittedly, the tutorial talks about finding the match; modification is up to you. In that case, you know the exact shape of the template itself, so using that information along with the location of the match, just overwrite the image data with the appropriate background color (based on the examples above, 255).
You can solve it by removing all the small clusters.
I found a Python solution (using OpenCV) here.
For supporting smaller fonts, I added the following heuristic:
"The largest size of the demarcation cluster is 1/500 of the largest letter cluster".
The heuristic can be refined, by statistical analysts (or improved by other heuristics, such as demarcation locations relative to the letters).
import numpy as np
import cv2
I = cv2.imread('Goodluck.png', cv2.IMREAD_GRAYSCALE)
J = 255 - I # Invert I
img = cv2.threshold(J, 127, 255, cv2.THRESH_BINARY)[1] # Convert to binary
# https://answers.opencv.org/question/194566/removing-noise-using-connected-components/
nlabel,labels,stats,centroids = cv2.connectedComponentsWithStats(img, connectivity=8)
labels_small = []
areas_small = []
# Find largest cluster:
max_size = np.max(stats[:, cv2.CC_STAT_AREA])
thresh_size = max_size / 500 # Set the threshold to maximum cluster size divided by 500.
for i in range(1, nlabel):
if stats[i, cv2.CC_STAT_AREA] < thresh_size:
labels_small.append(i)
areas_small.append(stats[i, cv2.CC_STAT_AREA])
mask = np.ones_like(labels, dtype=np.uint8)
for i in labels_small:
I[labels == i] = 255
cv2.imshow('I', I)
cv2.waitKey(0)
Here is a MATLAB code sample (kept threshold = 200):
clear
I = imbinarize(rgb2gray(imread('בהצלחה.png')));
figure;imshow(I);
J = ~I;
%Clustering
CC = bwconncomp(J);
%Cover all small clusters with zewros.
for i = 1:CC.NumObjects
C = CC.PixelIdxList{i}; %Cluster coordinates.
%Fill small clusters with zeros.
if numel(C) < 200
J(C) = 0;
end
end
J = ~J;
figure;imshow(J);
Result:
I have used opencv to create some contours, and I need to identify a specific point on a contour, which is usually the innermost point of a 'V' shape. In the attached image, the point I want to identify is shown by the green arrows.
On the left is an easy case, where identification can be done (for example) by computing a convex hull of the contour, and then finding the point furthest from the hull.
However, on the right of the attached image is a much more difficult case, where instead of 1 contour, I get several, and the nice 'V' shape is not present, making it impossible to identify the innermost point of the 'V'. As shown by the red dotted line, one solution might be to extrapolate the higher contour until it intersects with the lower one. Does anyone know how I might go about this? Or have a better solution?
For the record I have tried:
dilation/erosion (works when multiple contours are close together, otherwise not)
hough transform p (tends to mislocate the target point)
Any pointers would be hugely appreciated.
This solution will work for the two images that you provided. This should also be a good solution for all other images that have a similar coloration and a 'v' shape (or at least a partial 'v' shape) that points to the right.
Let's take a look at the easier image first. I started by segmenting the image using color spaces.
# Convert frame to hsv color space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Define range of pink color in HSV
(b,r,g,b1,r1,g1) = 0,0,0,110,255,255
lower = np.array([b,r,g])
upper = np.array([b1,r1,g1])
# Threshold the HSV image to get only pink colors
mask = cv2.inRange(hsv, lower, upper)
Next, I found the mid_point where there was an equal amount of white above and below that row.
# Calculate the mid point
mid_point = 1
top, bottom = 0, 1
while top < bottom:
top = sum(sum(mask[:mid_point, :]))
bottom = sum(sum(mask[mid_point:, :]))
mid_point += 1
Then, I floodfilled the image starting at the midpoint:
bg = np.zeros((h+2, w+2), np.uint8)
kernel = np.ones((k_size, k_size),np.uint8)
cv2.floodFill(mask, bg, (0, mid_point), 123)
Now that I have the floodfilled image, I know the point that I am looking for is the gray pixel that is the closest to the right side of the image.
# Find the gray pixel that is furthest to the right
idx = 0
while True:
column = mask_temp[:,idx:idx+1]
element_id, gray_px, found = 0, [], False
for element in column:
if element == 123:
v_point = idx, element_id
found = True
element_id += 1
# If no gray pixel is found, break out of the loop
if not found: break
idx += 1
The result:
Now for the harder image. In the image on the right, the 'v' does not fully connect:
To close the 'v', I iteratively dilated the mask checked if it connected:
# Flood fill and dilate loop
k_size, iters = 1, 1
while True:
bg = np.zeros((h+2, w+2), np.uint8)
mask_temp = mask.copy()
kernel = np.ones((k_size, k_size),np.uint8)
mask_temp = cv2.dilate(mask_temp,kernel,iterations = iters)
cv2.floodFill(mask_temp, bg, (0, mid_point), 123)
cv2.imshow('mask', mask_temp)
cv2.waitKey()
k_size += 1
iters += 1
# Break out of the loop of the right side of the image is black
if mask_temp[h-1,w-1]==0 and mask_temp[1, w-1]==0: break
This is the resulting output:
I want to do the following loop through an image to remove or modify a pixel if it equals rgb value with threshold.
The goal is to remove the background of an image and feed the image to an OCR.
I have tried 2 different methods to do this.
Method 1:
Basically what I do is get the average background pixel value.
And than loop over all pixels and check which pixels equal the average background pixel.
for x in range(0, w):
for y in range(0, h):
if Pixel(img[y, x]).compare(pixel, threshold):
img[y, x] = 255
else
img[y, x] = 0
compare function will check if it >=/<= the pixel -/+ the threshold value. then if it returns true it will change the pixel to white else to black.
This works well however it is wayyyy too slow when you use bigger pictures.
Method 2:
Just use an opencv method to remove the background.
Simply:
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
th3 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 41, 2)
Results:
Feed normal image:
result method 1:
result method 2:
Feed inverted image:
result method 1:
result method 2:
The first method is way to slow and the second method only works when the image has a whitish background I guess.
I do need it for different background colors.
I found something about vectorizing the numpy array. But couldn't really found a good example about it.
To answer the question with concrete example:
#load an image as grayscale
#get the background average pixel value, it out of the scope of this question, different methods to achieve
bg_avg = get_bg_avg_px_val(img)
th = 80
background_mask = logical_and((bg_avg - th) <= img, img <= (bg_avg + th))
text_mask = logical_or((bg_avg - th) >= img, img >= (bg_avg + th))
img[selected] = 255
img[text] = 0
I am working on this image as source:
Applying the next code...
import cv2
import numpy as np
mser = cv2.MSER_create()
img = cv2.imread('C:\\Users\\Link\\Desktop\\test2.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions, _ = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
cv2.polylines(vis, hulls, 1, (0, 255, 0))
mask = np.zeros((img.shape[0], img.shape[1], 1), dtype=np.uint8)
for contour in hulls:
cv2.drawContours(mask, [contour], -1, (255, 255, 255), -1)
text_only = cv2.bitwise_and(img, img, mask=mask)
cv2.imshow('img', vis)
cv2.waitKey(0)
cv2.imshow('img', mask)
cv2.waitKey(0)
cv2.imshow('img', text_only)
cv2.waitKey(0)
cv2.imwrite('C:\\Users\\Link\\Desktop\\test_o\\1.png', text_only)
...I am obtaining this as result (mask):
The question is this:
how to merge into a single object the number 5 in the number series (157661546) as long as it is divided in the mask image ?
Thanks
Have a look here, it seems like the exact answer.
Here instead there is my version of the above code fine tuned for text extraction (with masking too).
Below there is the original code from the previous article, "ported" to python 3, opencv 3, added mser and bounding boxes. The main difference with my version is how the grouping distance is defined: mine is text-oriented while the one below is a free geometrical distance.
import sys
import cv2
import numpy as np
def find_if_close(cnt1,cnt2):
row1,row2 = cnt1.shape[0],cnt2.shape[0]
for i in range(row1):
for j in range(row2):
dist = np.linalg.norm(cnt1[i]-cnt2[j])
if abs(dist) < 25: # <-- threshold
return True
elif i==row1-1 and j==row2-1:
return False
img = cv2.imread(sys.argv[1])
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
cv2.imshow('input', img)
ret,thresh = cv2.threshold(gray,127,255,0)
mser=False
if mser:
mser = cv2.MSER_create()
regions = mser.detectRegions(thresh)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
contours = hulls
else:
thresh = cv2.bitwise_not(thresh) # wants black bg
im2,contours,hier = cv2.findContours(thresh,cv2.RETR_EXTERNAL,2)
cv2.drawContours(img, contours, -1, (0,0,255), 1)
cv2.imshow('base contours', img)
LENGTH = len(contours)
status = np.zeros((LENGTH,1))
print("Elements:", len(contours))
for i,cnt1 in enumerate(contours):
x = i
if i != LENGTH-1:
for j,cnt2 in enumerate(contours[i+1:]):
x = x+1
dist = find_if_close(cnt1,cnt2)
if dist == True:
val = min(status[i],status[x])
status[x] = status[i] = val
else:
if status[x]==status[i]:
status[x] = i+1
unified = []
maximum = int(status.max())+1
for i in range(maximum):
pos = np.where(status==i)[0]
if pos.size != 0:
cont = np.vstack(contours[i] for i in pos)
hull = cv2.convexHull(cont)
unified.append(hull)
cv2.drawContours(img,contours,-1,(0,0,255),1)
cv2.drawContours(img,unified,-1,(0,255,0),2)
#cv2.drawContours(thresh,unified,-1,255,-1)
for c in unified:
(x,y,w,h) = cv2.boundingRect(c)
cv2.rectangle(img, (x,y), (x+w,y+h), (255, 0, 0), 2)
cv2.imshow('result', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Sample output (the yellow blob is below the binary threshold conversion so it's ignored). Red: original contours, green: unified ones, blue: bounding boxes.
Probably there is no need to use MSER as a simple findContours may work fine.
------------------------
Starting from here there is my old answer, before I found the above code. I'm leaving it anyway as it describe a couple of different approaches that may be easier/more appropriate for some scenarios.
A quick and dirty trick is to add a small gaussian blur and a high threshold before the MSER (or some dilute/erode if you prefer fancy things). In practice you just make the text bolder so that it fills small gaps. Obviously you can later discard this version and crop from the original one.
Otherwise, if your text is in lines, you may try to detect the average line center (make an histogram of Y coordinates and find the peaks for example). Then, for each line, look for fragments with a close average X. Quite fragile if text is noisy/complex.
If you do not need to split each letter, getting the bounding box for the whole word, may be easier: just split in groups based on a maximum horizontal distance between fragments (using the leftmost/rightmost points of the contour). Then use the leftmost and rightmost boxes within each group to find the whole bounding box. For multiline text first group by centroids Y coordinate.
Implementation notes:
Opencv allows you to create histograms but you probably can get away with something like this (worked for me on a similar task):
def histogram(vals, th=4, bins=400):
hist = np.zeros(bins)
for y_center in vals:
bucket = int(round(y_center / 2.)) <-- change this "2."
hist[bucket-1] += 1
print("hist: ", hist)
hist = np.where(hist > th, hist, 0)
return hist
Here my histogram is just an array with 400 buckets (my image was 800px high so each bucket catches two pixels, that is where the "2." comes from). Vals are the Y coordinates of the centroids of each fragment (you may want to ignore very small elements when you build this list). The th threshold is there just to remove some noise. You should get something like this:
0,0,0,5,22,0,0,0,0,43,7,0,0,0
This list describes, moving top to bottom, how many fragments are at each location.
Now I ran another pass to merge the peaks into a single value (just scan the array and sum while it is non-zero and reset the count on first zero) getting something like this {y:count}:
{9:27, 20:50}
Now I know I have two text rows at y=9 and y=20. Now, or before, you assign each fragment to on line (with again an 8px threshold in my case). Now you can process each line on its own, finding "words". BTW, I have your identical problem with broken letters that's why I came here looking for MSER :). Notice that if you find the whole bounding box for the word this problem happens only on the first/last letters: the other broken letters just falls inside the word box anyway.
Here is a reference for the erode/dilate thing, but gaussian blur/th worked for me.
UPDATE: I've noticed that there is something wrong in this line:
regions = mser.detectRegions(thresh)
I pass in the already thresholded image(!?). This is not relevant for the aggregation part but keep in mind that the mser part is not being used as expected.
I created the following image (image 3) using a threshold mask (image 2) on image 1. I am trying to convert all the pixels outside of the central image of image 3 (of lungs) to one colour (for example black) using opencv. Basically so that I am left with just the image of the lungs against a uniform background (or even transparent). My problem has been the similarity of the very outer pixels to those inside the lungs on image 3. Is this possible to do using opencv?
Simply floodFill() the mask from the boundaries of the image with black. See the flood fill step in my answer here to see it used in another scenario.
Similarly, you can use floodFill() to find which pixels connect to the edges of the image, which means you can use it to put back the holes in the lungs from thresholding. See my answer here for a different example of this hole-filling process.
I copy and pasted the code straight from the above answers, only modifying the variable names:
import cv2
import numpy as np
img = cv2.imread('img.jpg', 0)
mask = cv2.imread('mask.png', 0)
# flood fill to remove mask at borders of the image
h, w = img.shape[:2]
for row in range(h):
if mask[row, 0] == 255:
cv2.floodFill(mask, None, (0, row), 0)
if mask[row, w-1] == 255:
cv2.floodFill(mask, None, (w-1, row), 0)
for col in range(w):
if mask[0, col] == 255:
cv2.floodFill(mask, None, (col, 0), 0)
if mask[h-1, col] == 255:
cv2.floodFill(mask, None, (col, h-1), 0)
# flood fill background to find inner holes
holes = mask.copy()
cv2.floodFill(holes, None, (0, 0), 255)
# invert holes mask, bitwise or with mask to fill in holes
holes = cv2.bitwise_not(holes)
mask = cv2.bitwise_or(mask, holes)
# display masked image
masked_img = cv2.bitwise_and(img, img, mask=mask)
masked_img_with_alpha = cv2.merge([img, img, img, mask])
cv2.imwrite('masked.png', masked_img)
cv2.imwrite('masked_transparent.png', masked_img_with_alpha)
Edit: As an aside, "transparency" is basically a mask: the values tell you how opaque each pixel is. If the pixel is 0, its totally transparent, if it's 255 (for uint8) then it's completely opaque, if it's in-between then it's partially transparent. So the exact same mask used here at the end could be stacked onto the image to create the fourth alpha channel (you can use cv2.merge or numpy to stack) where it will make every 0 pixel in the mask totally transparent; simply save the image as a png for the transparency. The above code creates an image with alpha transparency as well as an image with a black background.
Here the background looks white because it is transparent, but if you save the image to your system you'll see it actually is transparent. FYI OpenCV actually ignores the alpha channel during imshow() so you'll only see the transparency on saving the image.
Edit: One last note...here your thresholding has removed some bits of the lungs. I've added back in the holes from thresholding that occur inside the lungs but this misses some chunks along the boundary that were removed. If you do contour detection on the mask, you can actually smooth those out a bit as well if it's important. Check out the "Contour Approximation" section on OpenCV's contour features tutorial. Basically it will try to smooth the contour but stick within some certain epsilon distance from the actual contour. This might be useful and is easy to implement, so I figured I'd throw it as a suggestion at the end here.