I want to do the following loop through an image to remove or modify a pixel if it equals rgb value with threshold.
The goal is to remove the background of an image and feed the image to an OCR.
I have tried 2 different methods to do this.
Method 1:
Basically what I do is get the average background pixel value.
And than loop over all pixels and check which pixels equal the average background pixel.
for x in range(0, w):
for y in range(0, h):
if Pixel(img[y, x]).compare(pixel, threshold):
img[y, x] = 255
else
img[y, x] = 0
compare function will check if it >=/<= the pixel -/+ the threshold value. then if it returns true it will change the pixel to white else to black.
This works well however it is wayyyy too slow when you use bigger pictures.
Method 2:
Just use an opencv method to remove the background.
Simply:
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
th3 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 41, 2)
Results:
Feed normal image:
result method 1:
result method 2:
Feed inverted image:
result method 1:
result method 2:
The first method is way to slow and the second method only works when the image has a whitish background I guess.
I do need it for different background colors.
I found something about vectorizing the numpy array. But couldn't really found a good example about it.
To answer the question with concrete example:
#load an image as grayscale
#get the background average pixel value, it out of the scope of this question, different methods to achieve
bg_avg = get_bg_avg_px_val(img)
th = 80
background_mask = logical_and((bg_avg - th) <= img, img <= (bg_avg + th))
text_mask = logical_or((bg_avg - th) >= img, img >= (bg_avg + th))
img[selected] = 255
img[text] = 0
Related
I have downloaded a number of images (1000) from a website but they each have a black and white ruler running along 1 or 2 edges and some have these catalogue number tickets. I need these elements removed, the ruler at the very least.
Example images of coins:
The images all have the ruler in slightly different places so i cant just preform the same crop on them.
So I tried to remove the black and replace it with white using this code
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
im = Image.open('image-0.jpg')
im = im.convert('RGBA')
data = np.array(im) # "data" is a height x width x 4 numpy array
red, green, blue, alpha = data.T # Temporarily unpack the bands for readability
# Replace black with white
black_areas = (red < 150) & (blue < 150) & (green < 150)
data[..., :-1][black_areas.T] = (255, 255, 255) # Transpose back needed
im2 = Image.fromarray(data)
im2.show()
but it pretty much just removed half the coin as well:
I was having a read of some posts on opencv but though I'd see if there was a simpler way I'd missed first.
So I have taken a look at your problem and I have found a solution for your two images you provided, I hope it works for you other images as well but it is always hard to tell as it can be different on an individual basis. This solution is using OpenCV for preprocessing and contour detection to get the 2nd and 3rd largest elements in your picture (largest is the bounding box around the edges) which should be your coins. Then I create a box around those two items and add some padding before I crop to size.
So we start off with preprocessing:
import numpy as np
import cv2
img = cv2.imread(r'<PATH TO YOUR IMAGE>')
img = cv2.resize(img, None, fx=3, fy=3)
imgray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(imgray, (5, 5), 0)
ret, thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
Still rather basic, we make the image bigger so it is easier to detect contours, then we turn it into grayscale, blur it and apply thresholding to it so we turn all grey values either white or black. This then gives us the following image:
We now do contour detection, get the areas around our contours and sort them by the biggest area. Then we drop the biggest one as it is the box around the whole image and take the 2nd and 3rd biggest. And then get the x,y,w,h values we are interested in.
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
areas = []
for cnt in contours:
area = cv2.contourArea(cnt)
areas.append((area, cnt))
areas.sort(key=lambda x: x[0], reverse=True)
areas.pop(0)
x, y, w, h = cv2.boundingRect(areas[0][1])
x2, y2, w2, h2 = cv2.boundingRect(areas[1][1])
If we draw a rectangle around those contours:
Now we take those coordinates and create a box around both of them. This might need some minor adjusting as I just quickly took the bigger width of the two and not the corresponding one for the right coin but since I added extra padding it should be fine in most cases. And finally crop to size:
pad = 15
img = img[(min(y, y2) - pad) : (max(y, y2) + max(h, h2) + pad),
(min(x, x2) - pad) : (max(x, x2) + max(w, w2) + pad)]
I hope this helps you to understand how you could achieve what you want, I tried it on both your images and it worked well for them. It might need some adjustments and depending on how your other images look the simple approach of taking the two biggest objects (apart from image bounding box) might be turned into something more sophisticated to detect the cricular shapes or something along those lines. Alternatively you could try to detect the rulers and crop from their position inwards. You will have to decide after you have done this on more example images in your dataset.
If you're looking for a robust solution, you should try something like Max Kaha's response, since it'll provide you with greater fine tuning.
Since the rulers tend to be left with just a little bit of text after your "black to white" filter, a quick solution is to use erosion followed by a dilation to create a mask for your images, and then apply the mask to the original image.
Pillow offers that with the ImageFilter class. Here's your code with a few modifications that'll achieve that:
from PIL import Image, ImageFilter
import numpy as np
import matplotlib.pyplot as plt
WHITE = 255, 255, 255
input_image = Image.open('image.png')
input_image = input_image.convert('RGBA')
input_data = np.array(input_image) # "data" is a height x width x 4 numpy array
red, green, blue, alpha = input_data.T # Temporarily unpack the bands for readability
# Replace black with white
thresh = 30
black_areas = (red < thresh) & (blue < thresh) & (green < thresh)
input_data[..., :-1][black_areas.T] = WHITE # Transpose back needed
erosion_factor = 5
# dilation is bigger to avoid cropping the objects of interest
dilation_factor = 11
erosion_filter = ImageFilter.MaxFilter(erosion_factor)
dilation_filter = ImageFilter.MinFilter(dilation_factor)
eroded = Image.fromarray(input_data).filter(erosion_filter)
dilated = eroded.filter(dilation_filter)
mask_threshold = 220
# the mask is black on regions to be hidden
mask = dilated.convert('L').point(lambda x: 255 if x < mask_threshold else 0)
# create base image
output_image = Image.new('RGBA', input_image.size, WHITE)
# paste only the desired regions
output_image.paste(input_image, mask=mask)
output_image.show()
You should also play around with the black to white threshold and the erosion/dilation factors to try and find the best fit for most of your images.
I am trying to increase the region of interest of an image using the below algorithm.
First, the set of pixels of the exterior border of the ROI is de termined, i.e., pixels that are outside the ROI and are neighbors (using four-neighborhood) to pixels inside it. Then, each pixel value of this set is replaced with the mean value of its neighbors (this time using eight-neighborhood) inside the ROI. Finally, the ROI is expanded by inclusion of this altered set of pixels. This process is repeated and can be seen as artificially increasing the ROI.
The pseudocode is below -
while there are border pixels:
border_pixels = []
# find the border pixels
for each pixel p=(i, j) in image
if p is not in ROI and ((i+1, j) in ROI or (i-1, j) in ROI or (i, j+1) in ROI or (i, j-1) in ROI) or (i-1,j-1) in ROI or (i+1,j+1) in ROI):
add p to border_pixels
# calculate the averages
for each pixel p in border_pixels:
color_sum = 0
count = 0
for each pixel n in 8-neighborhood of p:
if n in ROI:
color_sum += color(n)
count += 1
color(p) = color_sum / count
# update the ROI
for each pixel p=(i, j) in border_pixels:
set p to be in ROI
Below is my code
img = io.imread(path_dir)
newimg = np.zeros((584, 565,3))
mask = img == 0
while(1):
border_pixels = []
for i in range(img.shape[0]):
for j in range(img.shape[1]):
for k in range(0,3):
if(i+1<=583 and j+1<=564 and i-1>=0 and j-1>=0):
if ((mask[i][j][k]) and ((mask[i+1][j][k]== False) or (mask[i-1][j][k]==False) or (mask[i][j+1][k]==False) or (mask[i][j-1][k]==False) or (mask[i-1][j-1][k] == False) or(mask[i+1][j+1][k]==False))):
border_pixels.append([i,j,k])
if len(border_pixels) == 0:
break
for (each_i,each_j,each_k) in border_pixels:
color_sum = 0
count = 0
eight_neighbourhood = [[each_i-1,each_j],[each_i+1,each_j],[each_i,each_j-1],[each_i,each_j+1],[each_i-1,each_j-1],[each_i-1,each_j+1],[each_i+1,each_j-1],[each_i+1,each_j+1]]
for pix_i,pix_j in eight_neighbourhood:
if (mask[pix_i][pix_j][each_k] == False):
color_sum+=img[pix_i,pix_j,each_k]
count+=1
print(color_sum//count)
img[each_i][each_j][each_k]=(color_sum//count)
for (i,j,k) in border_pixels:
mask[i,j,k] = False
border_pixels.remove([i,j,k])
io.imsave("tryout6.png",img)
But it is not doing any change in the image.I am getting the same image as before
so I tried plotting the border pixel on a black image of the same dimension for the first iteration and I am getting the below result-
I really don't have any idea where I am doing wrong here.
Here's a solution that I think works as you have requested (although I agree with #Peter Boone that it will take a while). My implementation has a triple loop, but maybe someone else can make it faster!
First, read in the image. With my method, the pixel values are floats between 0 and 1 (rather than integers between 0 and 255).
import urllib
import matplotlib.pyplot as plt
import numpy as np
from skimage.morphology import binary_dilation, binary_erosion, disk
from skimage.color import rgb2gray
from skimage.filters import threshold_otsu
# create a file-like object from the url
f = urllib.request.urlopen("https://i.stack.imgur.com/JXxJM.png")
# read the image file in a numpy array
# note that all pixel values are between 0 and 1 in this image
a = plt.imread(f)
Second, add some padding around the edges, and threshold the image. I used Otsu's method, but #Peter Boone's answer works well, too.
# add black padding around image 100 px wide
a = np.pad(a, ((100,100), (100,100), (0,0)), mode = "constant")
# convert to greyscale and perform Otsu's thresholding
grayscale = rgb2gray(a)
global_thresh = threshold_otsu(grayscale)
binary_global1 = grayscale > global_thresh
# define number of pixels to expand the image
num_px_to_expand = 50
The image, binary_global1 is a mask that looks like this:
Since the image is three channels (RGB), I process the channels separately. I noticed that I needed to erode the image by ~5 px because the outside of the image has some unusual colors and patterns.
# process each channel (RGB) separately
for channel in range(a.shape[2]):
# select a single channel
one_channel = a[:, :, channel]
# reset binary_global for the each channel
binary_global = binary_global1.copy()
# erode by 5 px to get rid of unusual edges from original image
binary_global = binary_erosion(binary_global, disk(5))
# turn everything less than the threshold to 0
one_channel = one_channel * binary_global
# update pixels one at a time
for jj in range(num_px_to_expand):
# get 1 px ring of to update
px_to_update = np.logical_xor(binary_dilation(binary_global, disk(1)),
binary_global)
# update those pixels with the average of their neighborhood
x, y = np.where(px_to_update == 1)
for x, y in zip(x,y):
# make 3 x 3 px slices
slices = np.s_[(x-1):(x+2), (y-1):(y+2)]
# update a single pixel
one_channel[x, y] = (np.sum(one_channel[slices]*
binary_global[slices]) /
np.sum(binary_global[slices]))
# update original image
a[:,:, channel] = one_channel
# increase binary_global by 1 px dilation
binary_global = binary_dilation(binary_global, disk(1))
When I plot the output, I get something like this:
# plot image
plt.figure(figsize=[10,10])
plt.imshow(a)
This is an interesting idea. You're going to want to use masks and some form of mean ranks to accomplish this. Going pixel by pixel will take you a while, instead you want to use different convolution filters.
If you do something like this:
image = io.imread("roi.jpg")
mask = image[:,:,0] < 30
just_inside = binary_dilation(mask) ^ mask
image[~just_inside] = [0,0,0]
you will have a mask representing just the pixels inside of the ROI. I also set the pixels not in that area to 0,0,0.
Then you can get the pixels just outside of the roi:
just_outside = binary_erosion(mask) ^ mask
Then get the mean bilateral of each channel:
mean_blue = mean_bilateral(image[:,:,0], selem=square(3), s0=1, s1=255)
#etc...
This isn't exactly correct, but I think it should put you in the right direction. I would check out image.sc if you have more general questions about image processing. Let me know if you need more help as this was more general direction than working code.
So i am trying to find pixels which are not white and create a bounding box around the image by examining the colors. I want to get the topmost, bottom most, leftmost and rightmost non white pixels, and use them to create a bounding box. I have used four loops to travel through each sides.also What i want to do is remove the background color (the background color is mostly grey) and change it to pure white. I have implemented all the functionality but now since i am using a lot of loops the code runs too slow. I need to optimize the loops while still having the functionality of finding the topmost, bottom most, leftmost and rightmost non white pixels and removing the colors. How can i do it?
The code below shows what i am doing to get the bounding box along with background removal at the same time. The mask is a black and white version of the image. If it is mask[i][j]==0 then it is a different color and hence i need to take the value and compare it with the values stored at p. It helps me find the bounding box. And if the mask[i][j]!=0 then i am changing the values of the image to white.
//for bounding box
p = []
p.append(5000)
p.append(0)
p.append(5000)
p.append(0)
for i in range(0, height):
for j in range(0, width):
if mask[i][j] == 0:
if j < p[0]:
p[0] = j
break
else:
img[i, j] = [255, 255, 255]
for i in range(0, height):
for j in reversed(range(0, width)):
if mask[i][j] == 0:
if j > p[1]:
p[1] = j
break
else:
img[i, j] = [255, 255, 255]
//topdown
for i in range(0, width):
for j in range(0, height):
if mask[j][i] == 0:
if j < p[2]:
p[2] = j
break
else:
img[j, i] = [255, 255, 255]
for i in reversed(range(0, width)):
for j in reversed(range(0, height)):
if mask[j][i] == 0:
if j > p[3]:
p[3] = j
break
else:
img[j, i] = [255, 255, 255]
So how can i optimize these loops while still getting the same functionality of getting pixel values and being able to change the color of some other image?
Background
To make the background white you can use a bitwise operation with the mask. To automate the creation of a mask read here.
Example:
import cv2
import numpy as np
# load image and mask
img = cv2.imread('image.png')
mask = cv2.imread('mask.png')
# combine images
res = cv2.bitwise_or(img,mask)
cv2.imshow("result", res)
cv2.waitKey(0)
cv2.destroyAllWindows()
The mask needs to have the equal number of color-channels as the image. All white areas in the mask will also become white in the image. Black areas in the mask will remain unaffected in the image.
Boundingbox
To get the boundingbox you could use findContours. It takes a binary mask as input and returns a list of contours. You can use the contour to find the boundingbox, rotated boundingbox or minimum enclosing circle. The result may not be perfect depending on your input, but you can use it to increase performance as it greatly narrows the search.
Note: the input to findContours should have a black background. You can modify your mask using inverted_mask = cv2.bitwise_not(mask). Or, if you obtained you mask using thresholding, you can choose an inverted threshold type.
Result:
Code:
import cv2
import numpy as np
# load image // use your mask instead
mask = cv2.imread('mask.png',0)
# find contours
contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
# get the boundingrect and draw a red line over it
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(mask2,(x,y),(x+w,y+h),(0,0,255),3)
# get the minumum enclosing rectangle and draw it in blue
rect = cv2.minAreaRect(cnt)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(mask2,[box],0,(255,0,0),3)
# diplay result
cv2.imshow("img", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()
Maths
If you'd rather stick to checking array values, you can boost performance by first summing the rows and the columns. Summing is fast (and baked into numpy) and now you can discard rows/columns by checking one value. You can see an example of this process in this answer. I would suggest using the mask with background black for this, as you can compare the sum with zero. This will essentially result in the red boundingbox above. Of course, when a non zero row/col is found you will still have to loop that one to find the exact coordinate.
Pfiew, that turned out much longer than intended...
#JohnColeman has a point. Nested Python loops will be relatively slow even with the best algorithms and there are libraries that can optimize such operations.
The algorithm itself could be sped up by using the results of each loop to limit the range of following loop. For example, if when looking for the top non-white pixel, you scanned from top to bottom as the outer loop, and left to right as the inner loop, and found a pixel (a, b) (with a is the distance from the top), then in the next section you went looking for the left pixel, you know that you can start scanning from a+1 in the top down outer loop, and no further than b - 1 in the left-right inner loop. Let's call the result (c, d).
Similarly the bottom pixel can be no less than c in the vertical and d in the horizontal.
I have an RGBA image where I have to find if any pixel has red value < 150 and to replace such pixels to black. I am using following code for this:
import numpy as np
imgarr = np.array(img)
for x in range(imgarr.shape[0]):
for y in range(imgarr.shape[1]):
if imgarr[x, y][0] < 150: # red value < 150
imgarr[x, y] = (0,0,0,255)
However, this is a slow loop and I am sure it can be optimized using some function such as numpy.where, but I am not able to fit it in this code. How can this be solved?
For one channel image, we can do as follow
out_val = 0
gray = cv2.imread("colour.png",0)
gray[gray<value] = out_val
Use np.where with the mask of comparison against the threshold -
img = np.asarray(img)
imgarr = np.where(img[...,[0]]<150,(0,0,0,255),img)
We are using img[...,[0]] to keep the number of dims as needed for broadcasted assignment with np.where. So, another way would be to use img[...,0,None]<150 to get the mask that keeps dims.
I'm new to opencv, I've managed to detect the object and place a ROI around it but I can't managed it so detect if the object is black or white. I've found something i think but i don't know if this is the right solution. The function should return True of False if it's black or white. Anyone experience with this?
def filter_color(img):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_black = np.array([0,0,0])
upper_black = np.array([350,55,100])
black = cv2.inRange(hsv, lower_black, upper_black)
If you are certain that the ROI is going to be basically black or white and not worried about misidentifying something, then you should be able to just average the pixels in the ROI and check if it is above or below some threshold.
In the code below, after you set an ROI using the newer numpy method, you can pass the roi/image into the method as if you were passing a full image.
Copy-Paste Sample
import cv2
import numpy as np
def is_b_or_w(image, black_max_bgr=(40, 40, 40)):
# use this if you want to check channels are all basically equal
# I split this up into small steps to find out where your error is coming from
mean_bgr_float = np.mean(image, axis=(0,1))
mean_bgr_rounded = np.round(mean_bgr_float)
mean_bgr = mean_bgr_rounded.astype(np.uint8)
# use this if you just want a simple threshold for simple grayscale
# or if you want to use an HSV (V) measurement as in your example
mean_intensity = int(round(np.mean(image)))
return 'black' if np.all(mean_bgr < black_max_bgr) else 'white'
# make a test image for ROIs
shape = (10, 10, 3) # 10x10 BGR image
im_blackleft_white_right = np.ndarray(shape, dtype=np.uint8)
im_blackleft_white_right[:, 0:4] = 10
im_blackleft_white_right[:, 5:9] = 255
roi_darkgray = im_blackleft_white_right[:,0:4]
roi_white = im_blackleft_white_right[:,5:9]
# test them with ROI
print 'dark gray image identified as: {}'.format(is_b_or_w(roi_darkgray))
print 'white image identified as: {}'.format(is_b_or_w(roi_white))
# output
# dark gray image identified as: black
# white image identified as: white
I don't know if this is the right approach but it worked for me.
black = [0,0,0]
Thres = 50
h,w = img.shape[:2]
black = 0
not_black = 0
for y in range(h):
for x in range(w):
pixel = img[y][x]
d = math.sqrt((pixel[0]-0)**2+(pixel[1]-0)**2+(pixel[2]-0)**2)
if d<Thres:
black = black + 1
else:
not_black = not_black +1
This one worked for me but like i said, don't know if this is the right approach. It's ask a lot of processing power therefore i defined a ROI which is much smaller. The Thres is currently hard-coded...