I am trying to remove a transparent watermark from an image.
Here is my sample image:
I would like to remove the text "Watermark" from the image. As you can see, the text is transparent. So I would like to replace that text to the original background.
Something like this would be my desired output:
I tried some examples (I am currently using cv2, if other libraries can solve the problem please also recommend), but none of them where near from succeeding. I know the way to go would be to have a mask (like in this post), but they all already have masked images, but I don't.
Here is what I tried to do to have a mask, I turned down the saturation to black and white, and created an image "imagemask.jpg", then tried going through the pixels with a for loop:
mask = cv2.imread('imagemask.jpg')
new = []
rows, cols, _ = mask.shape
for i in range(rows):
new.append([])
#print(i)
for j in range(cols):
k = img[i, j]
#print(k)
if all(x in range(110, 130) for x in k):
new[-1].append((255, 255, 255))
else:
new[-1].append((0, 0, 0))
cv2.imwrite('finalmask.jpg', np.array(new))
Then after that wanted to use the code for the mask, but I realized the "finalmask.jpg" is a complete mess... so I didn't try using the code for the mask.
Is this actually possible? I have been trying for around 3 hours but receiving no luck...
This is not trivial, my friend. To add insult to injury, your image is very low-res, compressed and has a nasty glare - that won't help processing at all. Please, look at your input and set your expectations accordingly. With that said, let's try to get the best result with what we have. These are the steps I propose:
Try to segment the watermark text from the image
Filter the segmentation mask and try to get a binary mask as clean as possible
Use the text mask to in-paint the offending area using the input image as reference
Now, the tricky part, as you already saw, is segmenting the text. After trying out some techniques and color spaces, I found that the CMYK color space - particularly the K channel - offers promising results. The text is reasonably clear and we can try an Adaptive Thresholding on this, let's take a look:
# Imports
import cv2
import numpy as np
# Read image
imagePath = "D://opencvImages//"
img = cv2.imread(imagePath+"0f5zZm.jpg")
# Store a deep copy for the inpaint operation:
originalImg = img.copy()
# Convert to float and divide by 255:
imgFloat = img.astype(np.float) / 255.
# Calculate channel K:
kChannel = 1 - np.max(imgFloat, axis=2)
OpenCV does not offer BGR to CMYK conversion directly, so I manually had to get the K channel using the conversion formula. It is very straightforward. The K (or Key) channel represents pixels of the lowest intensity (black) with color white. Meaning that the text, which is almost white, will be rendered in black... This is the K Channel of the input:
You see how the darker pixels on the input are almost white here? That's nice, it seems to get a clear separation between the text and everything else. It's a shame that we have some big nasty glare on the right side. Anyway, the conversion involves float operations, so gotta be careful with data types. Maybe we can improve this image with a little brightness/contrast adjustment. Just a little bit, I'm just trying to separate more the text from that nasty glare:
# Apply a contrast/brightness adjustment on Channel K:
alpha = 0
beta = 1.2
adjustedK = cv2.normalize(kChannel, None, alpha, beta, cv2.NORM_MINMAX, cv2.CV_32F)
# Convert back to uint 8:
adjustedK = (255*adjustedK).astype(np.uint8)
This is the adjusted image:
There's a little bit more separation between the text and the glare, it seems. Alright, let's apply an Adaptive Thresholding on this bad boy to get an initial segmentation mask:
# Adaptive Thresholding on adjusted Channel K:
windowSize = 21
windowConstant = 11
binaryImg = cv2.adaptiveThreshold(adjustedK, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, windowSize, windowConstant)
You see I'm using a not-so-big windowSize here for the thresholding? Feel free to tune out these parameters if you like. This is the binary image I get:
Yeah, there's a lot of noise. Here's what I propose to get a cleaner mask: There's some obvious blobs that are bigger than the text. Likewise, there are other blobs that are smaller than the text. Let's locate the big blobs and the small blobs and subtract them. The resulting image should contain the text, if we set our parameters correctly. Let's see:
# Get the biggest blobs on the image:
minArea = 180
bigBlobs = areaFilter(minArea, binaryImg)
# Filter the smallest blobs on the image:
minArea = 20
smallBlobs = areaFilter(minArea, binaryImg)
# Let's try to isolate the text:
textMask = smallBlobs - bigBlobs
cv2.imshow("Text Mask", textMask)
cv2.waitKey(0)
Here I'm using a helper function called areaFilter. This function returns all the blobs of an image that are above a minimum area threshold. I'll post the function at the end of the answer. In the meantime, check out these cool images:
Big blobs:
Filtered small blobs:
The difference between them:
Sadly, it seems that some portions of the characters didn't survive the filtering operations. That's because the intersection of the glare and the text is too much for the algorithm to get a clear separation. Something that could benefit the result of the in-painting is a subtle blur on this mask, to get rid of that compression alias. Let's apply some Gaussian Blur to smooth the mask a little bit:
# Blur the mask a little bit to get a
# smoother inpanting result:
kernelSize = (3, 3)
textMask = cv2.GaussianBlur(textMask, kernelSize, cv2.BORDER_DEFAULT)
The kernel is not that big, I just want a subtle effect. This is the result:
Finally, let's apply the in-painting:
# Apply the inpaint method:
inpaintRadius = 10
inpaintMethod = cv2.INPAINT_TELEA
result = cv2.inpaint(originalImg, textMask, inpaintRadius, inpaintMethod)
cv2.imshow("Inpaint Result", result)
cv2.waitKey(0)
This is the final result:
Well, is not that bad, considering the input image. You can try to further improve the result adjusting some values, but the reality of this life, my dude, is that the input image is not that great to begin with. Here's the areaFilter function:
def areaFilter(minArea, inputImage):
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(inputImage, connectivity=4)
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
return filteredImage
Related
I want to use pytesseract to read digits from images. The images look as follows:
The digits are dotted and in order to be able to use pytesseract, I need black connected digits on a white background. To do so, I thought about using erode and dilate as preprocessing techniques. As you can see, the images are similar, yet quite different in certain aspects. For example, the dots in the first image are darker than the background, while the dots in the second are whiter. That means, in the first image I can use erode to get black connected lines and in the second image I can use dilate to get white connected lines and then inverse the colors. This leads to the following results:
Using an appropriate threshold, the first image can easily be read with pytesseract. The second image, whoever, is more tricky. The problem is, that for example parts of the "4" are darker than the background around the three. So a simple threshold is not going to work. I need something like local threshold or local contrast enhancement. Does anybody have an idea here?
Edit:
OTSU, mean threshold and gaussian threshold lead to the following results:
Your images are pretty low res, but you can try a method called gain division. The idea is that you try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image.
After gain division is performed, you can try to improve the image by applying an area filter and morphology. I only tried your first image, because it is the "least worst".
These are the steps to get the gain-divided image:
Apply a soft median blur filter to get rid of high frequency noise.
Get the model of the background via local maximum. Apply a very strong close operation, with a big structuring element (I’m using a rectangular kernel of size 15).
Perform gain adjustment by dividing 255 between each local maximum pixel. Weight this value with each input image pixel.
You should get a nice image where the background illumination is pretty much normalized, threshold this image to get a binary mask of the characters.
Now, you can improve the quality of the image with the following, additional steps:
Threshold via Otsu, but add a little bit of bias. (This, unfortunately, is a manual step depending on the input).
Apply an area filter to filter out the smaller blobs of noise.
Let's see the code:
import numpy as np
import cv2
# image path
path = "C:/opencvImages/"
fileName = "iA904.png"
# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)
# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)
# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)
# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))
# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)
# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8")
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)
This is what gain division gets you:
Note that the lighting is more balanced. Now, let's apply a little bit of contrast enhancement:
# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))
You get this, which creates a little bit more contrast between the foreground and the background:
Now, let's try to threshold this image to get a nice, binary mask. As I suggested, try Otsu's thresholding but add (or subtract) a little bit of bias to the result. This step, as mentioned, is dependent on the quality of your input:
# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)
You end up with this binary mask:
Invert this and filter out the small blobs. I set an area threshold value of 10 pixels:
# Invert image:
binaryImage = 255 - binaryImage
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)
# Set the minimum pixels for the area filter:
minArea = 10
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")
And this is the final binary mask:
If you plan on sending this image to an OCR, you might want to apply some morphology first. Maybe a closing to try and join the dots that make up the characters. Also be sure to train your OCR classifier with a font that is close to what you are actually trying to recognize. This is the (inverted) mask after a size 3 rectangular closing operation with 3 iterations:
Edit:
To get the last image, process the filtered output as follows:
# Set kernel (structuring element) size:
kernelSize = 3
# Set operation iterations:
opIterations = 3
# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage
Say I have 2 white images (RGB 800x600 image) that is 'dirty' at some unknown positions, I want to create a final combined image that has all the dirty parts of both images.
Just adding the images together reduces the 'dirtyness' of each blob, since I half the pixel values and then add them (to stay in the 0->255 rgb range), this is amplified when you have more than 2 images.
What I want to do is create a mask for all relatively white pixels in the 3 channel image, I've seen that if all RGB values are within 10-15 of each other, a pixel is relatively white. How would I create this mask using numpy?
Pseudo code for what I want to do:
img = cv2.imread(img) #BGR image
mask = np.where( BGR within 10 of each other)
Then I can use the first image, and replace pixels on it where the second picture is not masked, keeping the 'dirtyness level' relatively dirty. (I know some dirtyness of the second image will replace that of the first, but that's okay)
Edit:
People asked for images so I created some sample images, the white would not always be so exactly white as in these samples which is why I need to use a 'within 10 BGR' range.
Image 1
Image 2
Image 3 (combined, ignore the difference in yellow blob from image 2 to here, they should be the same)
What you asked for is having the pixels in which the distance between colors is under 10.
Here it is, translated to numpy.
img = cv2.imread(img) # assuming rgb image in naming
r = img[:, :, 0]
g = img[:, :, 1]
b = img[:, :, 2]
rg_close = np.abs(r - g) < 10
gb_close = np.abs(g - b) < 10
br_close = np.abs(b - r) < 10
all_close = np.logical_and(np.logical_and(rg_close, gb_close), br_close)
I do believe, however, that this is not what you REALLY want.
I think what you want in a mask that segments the background.
This is actually simpler, assuming the background is completely white:
img = cv2.imread(img)
background_mask = 245 * 3 < img[: ,: ,0] + img[: ,: ,1] + img[: ,: ,2]
Please note this code required thresholding games, and only shows a concept.
I would suggest you convert to HSV colourspace and look for saturated (colourful) pixels like this:
import cv2
# Load background and foreground images
bg = cv2.imread('A.jpg')
fg = cv2.imread('B.jpg')
# Convert to HSV colourspace and extract just the Saturation
Sat = cv2.cvtColor(fg, cv2.COLOR_BGR2HSV)[..., 1]
# Find best (Otsu) threshold to divide black from white, and apply it
_ , mask = cv2.threshold(Sat,0,1,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# At each pixel, choose foreground where mask is set and background elsewhere
res = np.where(mask[...,np.newaxis], fg, bg)
# Save the result
cv2.imwrite('result.png', res)
Note that you can modify this if it picks up too many or too few coloured pixels. If it picks up too few, you could dilate the mask and if it it picks up too many, you could erode the mask. You could also blur the image a little bit before masking which might not be a bad idea as it is a "nasty" JPEG with compression artefacts in it. You could change the saturation test and make it more clinical and targeted if you only wanted to allow certain colours through, or a certain brightness or a comnbination.
I have an image processing problem that I can't solve. I have a set of 375 images like the one below (1). I'm trying to remove the background, so to make "background substraction" (or "foreground extraction") and get only the waste on a plain background (black/white/...).
(1) Image example
I tried many things, including createBackgroundSubtractorMOG2 from OpenCV, or threshold. I also tried to remove the background pixel by pixel by subtracting it from the foreground because I have a set of 237 background images (2) (the carpet without the waste, but which is a little bit offset from the image with the objects). There are also variations in brightness on the background images.
(2) Example of a background image
Here is a code example that I was able to test and that gives me the results below (3) and (4). I use Python 3.8.3.
# Function to remove the sides of the images
def delete_side(img, x_left, x_right):
for i in range(img.shape[0]):
for j in range(img.shape[1]):
if j<=x_left or j>=x_right:
img[i,j] = (0,0,0)
return img
# Intialize the background model
backSub = cv2.createBackgroundSubtractorMOG2(history=250, varThreshold=2, detectShadows=True)
# Read the frames and update the background model
for frame in frames:
if frame.endswith(".png"):
filepath = FRAMES_FOLDER + '/' + frame
img = cv2.imread(filepath)
img_cut = delete_side(img, x_left=190, x_right=1280)
gray = cv2.cvtColor(img_cut, cv2.COLOR_BGR2GRAY)
mask = backSub.apply(gray)
newimage = cv2.bitwise_or(img, img, mask=mask)
img_blurred = cv2.GaussianBlur(newimage, (5, 5), 0)
gray2 = cv2.cvtColor(img_blurred, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray2, 10, 255, cv2.THRESH_BINARY)
final = cv2.bitwise_or(img, img, mask=binary)
newpath = RESULT_FOLDER + '/' + frame
cv2.imwrite(newpath, final)
I was inspired by many other cases found on Stackoverflow or others (example: removing pixels less than n size(noise) in an image - open CV python).
(3) The result obtained with the code above
(4) Result when increasing the varThreshold argument to 10
Unfortunately, there is still a lot of noise on the resulting pictures.
As a beginner in "background substraction", I don't have all the keys to get an optimal solution. If someone would have an idea to do this task in a more efficient and clean way (Is there a special method to handle the case of transparent objects? Can noise on objects be eliminated more effectively? etc.), I'm interested :)
Thanks
Thanks for your answers. For information, I simply change of methodology and use a segmentation model (U-Net) with 2 labels (foreground, background), to identify the background. It works quite well.
I need to find the largest empty area in the document and display its coordinates, center point and area, using python to put a QR Code there.
I think OpenCV and Numpy should be enough for this task.
What kinda THRESH to use? Because there are a lot of types of scans:
gray, BW, with color, and how to find the contour properly?
How this can be implemented in the fastest way? An example using the
first scan from google is attached, where you can see that the code
should find the largest empty square area.
#Mark Setchell Thanks! This code works perfectly for all docs with a white background, but when I use smth with a color in the background it finds a completely different area. Also, to keep thin lines in the docs I used Erode after thresholding. Tried to change thresholding and erode parameters, still not working properly.
Edited post, added color pictures.
Here's a possible approach:
#!/usr/bin/env python3
import cv2
import numpy as np
def largestSquare(im):
# Make image square of 100x100 to simplify and speed up
s = 100
work = cv2.resize(im, (s,s), interpolation=cv2.INTER_NEAREST)
# Make output accumulator - uint16 is ok because...
# ... max value is 100x100, i.e. 10,000 which is less than 65,535
# ... and you can make a PNG of it too
p = np.zeros((s,s), np.uint16)
# Find largest square
for i in range(1, s):
for j in range(1, s):
if (work[i][j] > 0 ):
p[i][j] = min(p[i][j-1], p[i-1][j], p[i-1][j-1]) + 1
else:
p[i][j] = 0
# Save result - just for illustration purposes
cv2.imwrite("result.png",p)
# Work out what the actual answer is
ind = np.unravel_index(np.argmax(p, axis=None), p.shape)
print(f'Location: {ind}')
print(f'Length of side: {p[ind]}')
# Load image and threshold
im = cv2.imread('page.png', cv2.IMREAD_GRAYSCALE)
_, thr = cv2.threshold(im,127,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# Get largest white square
largestSquare(thr)
Output
Location: (21, 77)
Length of side: 18
Notes:
I edited out your red annotation so it didn't interfere with my algorithm.
I did Otsu thresholding to get pure black and white - that may or may not be appropriate to your use case. It will depend on your scans and paper background etc.
I scaled the image down to 100x100 so it doesn't take all day to run. You will need to scale the results back up to the size of your original image but I assume you can do that easily enough.
Keywords: Image processing, image, Python, OpenCV, largest white square, largest empty space.
I'm trying to implement binary image filter (to get monochrome binary image) using python & PyQT5, and, to retrieve new pixel colors I use the following method:
def _new_pixel_colors(self, x, y):
color = QColor(self.pixmap.pixel(x, y))
result = qRgb(0, 0, 0) if all(c < 127 for c in color.getRgb()[:3]) else qRgb(255, 255, 255)
return result
Could It be a correct sample of binary filter for RGB image? I mean, is that a sufficient condition to check whether the pixel is brighter or darker then (127,127,127) Gray color?
And please, do not provide any solutions with opencv, pillow, etc. I'm only asking about the algorithm itself.
I would at least compare against intensity i=R+G+B ...
For ROI like masks you can use any thresholding techniques (adaptive thresholding is the best) but if your resulting image is not a ROI mask and should resemble the visual features of the original image then the best conversion I know of is to use Dithering.
The Idea behind BW dithering is to convert gray scales into BW patterns preserwing the shading. The result is often noisy but preserves much much more visual details. Here simple naive C++ dithering (sorry not a Python coder):
picture pic0,pic1;
// pic0 - source img
// pic1 - output img
int x,y,i;
color c;
// resize output to source image size clear with black
pic1=pic0; pic1.clear(0);
// dithering
i=0;
for (y=0;y<pic0.ys;y++)
for (x=0;x<pic0.xs;x++)
{
// get source pixel color (AARRGGBB)
c=pic0.p[y][x];
// add to leftovers
i+=WORD(c.db[picture::_r]); // _r,_g,_b are just constants 0,1,2
i+=WORD(c.db[picture::_g]);
i+=WORD(c.db[picture::_b]);
// threshold white intensity is 255+255+255=765
if (i>=384){ i-=765; c.dd=0x00FFFFFF; } else c.dd=0;
// copy to destination image
pic1.p[y][x]=c;
}
So its the same as in the link above but using just black and white. i is the accumulated intensity to be placed on the image. xs,ys is the resolution and c.db[] is color channel access.
If I apply this on colored image like this:
The result looks like this:
As you can see all the details where preserved but a noisy patterns emerge ... For printing purposes was sometimes the resolution of the image multiplied to enhance the quality. If you change the naive 2 nested for loops with a better pattern (like 16x16 squares etc) then the noise will be conserved near its source limiting artifacts. There are also approaches that use pseudo random patterns (put the leftover i near its source pixel in random location) that is even better ...
But for a BW dithering even naive approach is enough as the artifacts are just one pixel in size. For colored dithering the artifacts could create unwanted horizontal line patterns of several pixels in size (depends on used palette mis match the worse palette the bigger artifacts...)
PS just for comparison to other answer threshold outputs this is the same image dithered:
Image thresholding is the class of algorithms you're looking for - a binary threshold would set pixels to 0 or 1, yes.
Depending on the desired output, consider converting your image first to other color spaces, in particular HSL, with the luminance channel. Using (127, 127, 127) as a threshold does not uniformly take brightness into account because each channel of RGB is the saturation of R, G, or B; consider this image:
from PIL import Image
import colorsys
def threshold_pixel(r, g, b):
h, l, s = colorsys.rgb_to_hls(r / 255., g / 255., b / 255.)
return 1 if l > .36 else 0
# return 1 if r > 127 and g > 127 and b > 127 else 0
def hlsify(img):
pixels = img.load()
width, height = img.size
# Create a new blank monochrome image.
output_img = Image.new('1', (width, height), 0)
output_pixels = output_img.load()
for i in range(width):
for j in range(height):
output_pixels[i, j] = threshold_pixel(*pixels[i, j])
return output_img
binarified_img = hlsify(Image.open('./sample_img.jpg'))
binarified_img.show()
binarified_img.save('./out.jpg')
There is lots of discussion on other StackExchange sites on this topic, e.g.
Binarize image data
How do you binarize a colored image?
how can I get good binary image using Otsu method for this image?