Trying to join 3 pictures

Trying to join 3 pictures - python

I am analysing images from accelerated ageing tests. I would like to display this cumulative sum of electrical discharges that happened on the surface on a diagram of what that setup of electrodes and total sample was like for reference.
Total size of the picture is equal to the size of the sample. It is a rectangle 5cm by 120cm. This is the cumulative image.
In the cumulative plot the aspects are less rectangular as all activity happens between electrodes so when previously summing frames I rejected areas that were not between electrodes in order to improve performance.
This image was generated with matplotlib with jet colouring. As such it was originally a data matrix. In order to add it to the reference picture I have saved this image as png and reloaded it. This operation changed the aspect ratio slightly but I resized it correctly later.
I have used this function to add to a white rectangle with the dimensions same as the raw footage. Initially I added scaled both top and bottom electrodes to the white background image.
def overlay(background_img, img_to_overlay_t, x, y, negpos, overlay_size=None):
bg_img = background_img.copy()
if overlay_size is not None:
img_to_overlay_t = cv.resize(img_to_overlay_t.copy(), overlay_size)
# Extract the alpha mask of the RGBA image, convert to RGB
b,g,r,a = cv.split(img_to_overlay_t)
overlay_color = cv.merge((b,g,r))
# Apply some simple filtering to remove edge noise
mask = cv.medianBlur(a,5)
#mask = np.zeros(source_img.shape[:2], np.uint8)
h, w, _ = overlay_color.shape
roi = bg_img[y:y+h, x:x+w]
## works sort of but with removed details
#mask = mask.astype(np.int8)
# Black-out the area behind the logo in our original ROI
img1_bg = cv.bitwise_and(roi.copy(),roi.copy(),mask = cv.bitwise_not(mask))
# Mask out the logo from the logo image.
img2_fg = cv.bitwise_and(overlay_color,overlay_color,mask = mask)
img2_fg = img2_fg.astype(np.float32)
img1_bg = img1_bg.astype(np.float32)
if negpos == False:
bg_img[y:y+h, x:x+w] = cv.subtract(img1_bg, img2_fg)
else:
bg_img[y:y+h, x:x+w] = cv.add(img1_bg, img2_fg)
return bg_img
It yields this diagram.
Everything worked fine up until this point. Problem was trying to add cumulative in between the electrodes. My issue comes with being unable to slot that cumulative plot into the reference layout image of the electrodes and total sample size. I get
OpenCV(3.4.1) Error: Assertion failed ((mtype == 0 || mtype == 1) && _mask.sameSize(*psrc1))
at
img1_bg = cv.bitwise_and(roi.copy(),roi.copy(),mask = cv.bitwise_not(mask))
I can fix that by casting mask to uint8 or int8
mask = mask.astype(np.uint8)
This has it's own issues as the blue discharge part ends up lossing all the features and basically consists of 3 different colours only.
This the outcome I'd like to have displayed/saved at the end. I've no idea how to get to it.

Related

Removing the underline of a URL in the image of a text message screenshot with python opencv and matplotlib

I have a screenshot received from an iPhone, both dark and light mode.
I need to use OCR to extract the URL but am unable to do so with the underlining that appears.
What would be the best way to remove the horizontal lines from the message? Except the phone number, it doesn't matter if other parts of the screenshot are distorted.
I've tried approaches as described in
Removing Horizontal Lines in image (OpenCV, Python, Matplotlib)
https://docs.opencv.org/3.2.0/d1/dee/tutorial_moprh_lines_detection.html
https://legacy.imagemagick.org/discourse-server/viewtopic.php?t=22338
And none seem to work well, at all.

Here's a possible solution for your problem. I'm using mock screenshots, since, like I suggested, it is better to use lossless images to get a better result. The main idea here is to extract the color of the text box and to fill the rest of the image with that color, then threshold the image. By doing this, we will reduce the intensity variation and obtain a better thresholded image - since the image histogram will contain fewer intensity values. These are the steps:
Crop the image to a ROI (Region Of Interest)
Get the colors in that ROI via K-Means
Get the color of the text box
Flood-fill the ROI with the color of the text box
Apply Otsu's thresholding to get a binary image
Get OCR of the image
Suppose this is our test images, one uses a a "light" theme while the other uses a "dark" theme:
I'll be using pyocr as OCR engine. Let's use image one, the code would be this:
# imports:
from PIL import Image
import numpy as np
import cv2
import pyocr
import pyocr.builders
tools = pyocr.get_available_tools()
# The tools are returned in the recommended order of usage
tool = tools[0]
langs = tool.get_available_languages()
lang = langs[0]
# image path
path = "D://opencvImages//"
fileName = "mockText.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Set the ROI location:
roiX = 0
roiY = 235
roiWidth = 750
roiHeight = 1080
# Crop the ROI:
smsROI = grayscaleImage[roiY:roiHeight, roiX:roiWidth]
The first bit crops the ROI - everything that is of interest, leaving out the "header" and the "footer" of the image, where's there's info that we really don't need. This is the current ROI:
Wouldn't be nice to (approximately) get all the colors used in the image? Fortunately that's what Color Quantization gives us - a reduced pallet of the average colors present in an image, provided the number of the colors we are looking for. Let's apply K-Means and use 3 clusters to group this colors.
In our test images, most of the pixels are background - so, the largest cluster of pixels will belong to the background. The text represents the smallest cluster of pixels. That leaves the remaining cluster our target - the color of the text box. Let's apply K-Means, then. We need to format the data before, though, because K-Means needs float re-arranged arrays:
# Reshape the data to width x height, number of channels:
kmeansData = smsROI.reshape((-1,1))
# convert the data to np.float32
kmeansData = np.float32(kmeansData)
# define criteria, number of clusters(K) and apply kmeans():
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 5, 1.0)
# Define number of clusters (3 colors):
K = 3
# Run K-means:
_, _, center = cv2.kmeans(kmeansData, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Convert the centers to uint8:
center = np.uint8(center)
# Sort centers from small to largest:
center = sorted(center, reverse=False)
# Get text color and min color:
textBoxColor = int(center[1][0])
minColor = min(center)[0]
print("Minimum Color is: "+str(minColor))
print("Text Box Color is: "+str(textBoxColor))
The info of interest is in center. That's where our colors are. After sorting this list and getting the minimum color value (that I'll use later to distinguish between a light and a dark theme) we can print the values. For the first test image, these values are:
Minimum Color is: 23
Text Box Color is: 225
Alright, so far so good. We have the color of the text box. Let's use that and flood-fill the entire ROI at position (x=0, y=0):
# Apply flood-fill at seed point (0,0):
cv2.floodFill(smsROI, mask=None, seedPoint=(0, 0), newVal=textBoxColor)
The result is this:
Very nice. Let's apply Otsu's thresholding on this bad boy:
# Threshold via Otsu:
_, binaryImage = cv2.threshold(smsROI, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
Now, here comes the minColor part. If you are processing a dark theme screenshot and threshold it you will get white text on black background. If you were to process a light theme screenshot you would get black text on white background. We will always produce the same no matter the input: white text and black background. Let's check the min color, if this equals 0 (black) you just received a dark theme screenshot and you don't need to invert the image. Otherwise, invert the image:
# Process "Dark Theme / Light Theme":
if minColor != 0:
# Invert image if is not already inverted:
binaryImage = 255 - binaryImage
cv2.imshow("binaryImage", binaryImage)
cv2.waitKey(0)
For our first test image, the result is:
Notice the little bits of small noise. Let's apply an area filter (function defined at the end of the post) to get rid of pixels below a certain area threshold:
# Run a minimum area filter:
minArea = 10
binaryImage = areaFilter(minArea, binaryImage)
This is the filtered image:
Very nice. Lastly, I write this image and use pyocr to get the text as a string:
cv2.imwrite(path + "ocrText.png", binaryImage)
txt = tool.image_to_string(
Image.open(path + "ocrText.png"),
lang=lang,
builder=pyocr.builders.TextBuilder()
)
print("Image text is: "+txt)
Which results in:
Image text is: 301248 is your Amazon
verification code
If you test the second image you get the same exact result. This is the definition and implementation of the areaFilter function:
def areaFilter(minArea, inputImage):
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(inputImage, connectivity=4)
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
return filteredImage

remove demarcation from text image - image processing

Hi I need to write a program that remove demarcation from gray scale image(image with text in it)
i read about thresholding and blurring but still i dont see how can i do it.
my image is an image of a hebrew text like that:
and i need to remove the demarcation(assuming that the demarcation is the smallest element in the image) the output need to be something like that
I want to write the code in python using opencv, what topics do i need to learn to be able to do that, and how?
thank you.
Edit:
I can use only cv2 functions

The symbols you want to remove are significantly smaller than all other shapes, you can use that to determine witch ones to remove.
First use threshold to convert the image to binary. Next, you can use findContours to detect the shapes and then contourArea to determine if the shape is larger than a threshold.
Finally you can can create a mask to remove the unwanted shapes, draw the larger symbols on a new image or draw the smaller symbols in white over the original symbols in the original image - making them disappear. I used that last technique in the code below.
Result:
Code:
import cv2
# load image as grayscale
img = cv2.imread('1MioS.png',0)
# convert to binary. Inverted, so you get white symbols on black background
_ , thres = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY_INV)
# find contours in the thresholded image (this gives all symbols)
contours, hierarchy = cv2.findContours(thres, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# loop through the contours, if the size of the contour is below a threshold,
# draw a white shape over it in the input image
for cnt in contours:
if cv2.contourArea(cnt) < 250:
cv2.drawContours(img,[cnt],0,(255),-1)
# display result
cv2.imshow('res', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Update
To find the largest contour, you can loop through them and keep track of the largest value:
maxArea = 0
for cnt in contours:
currArea = cv2.contourArea(cnt)
if currArea > maxArea:
maxArea = currArea
print(maxArea)
I also whipped up a little more complex version, that creates a sorted list of the indexes and sizes of the contours. Then it looks for the largest relative difference in size of all contours, so you know which contours are 'small' and 'large'. I do not know if this works for all letters / fonts.
# create a list of the indexes of the contours and their sizes
contour_sizes = []
for index,cnt in enumerate(contours):
contour_sizes.append([index,cv2.contourArea(cnt)])
# sort the list based on the contour size.
# this changes the order of the elements in the list
contour_sizes.sort(key=lambda x:x[1])
# loop through the list and determine the largest relative distance
indexOfMaxDifference = 0
currentMaxDifference = 0
for i in range(1,len(contour_sizes)):
sizeDifference = contour_sizes[i][1] / contour_sizes[i-1][1]
if sizeDifference > currentMaxDifference:
currentMaxDifference = sizeDifference
indexOfMaxDifference = i
# loop through the list again, ending (or starting) at the indexOfMaxDifference, to draw the contour
for i in range(0, indexOfMaxDifference):
cv2.drawContours(img,contours,contour_sizes[i][0] ,(255),-1)
To get the background color you can do use minMaxLoc. This returns the lowest color value and it's position of an image (also the max value, but you don't need that). If you apply it to the thresholded image - where the background is black -, it will return the location of a background pixel (big odds it will be (0,0) ). You can then look up this pixel in the original color image.
# get the location of a pixel with background color
min_val, _, min_loc, _ = cv2.minMaxLoc(thres)
# load color image
img_color = cv2.imread('1MioS.png')
# get bgr values of background
b,g,r = img_color[min_loc]
# convert from numpy object
background_color = (int(b),int(g),int(r))
and then to draw the contours
cv2.drawContours(img_color,contours,contour_sizes[i][0],background_color,-1)
and of course
cv2.imshow('res', img_color)

This looks like a problem for template matching since you have what looks like a known font and can easily understand what the characters and/or demarcations are. Check out https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html
Admittedly, the tutorial talks about finding the match; modification is up to you. In that case, you know the exact shape of the template itself, so using that information along with the location of the match, just overwrite the image data with the appropriate background color (based on the examples above, 255).

You can solve it by removing all the small clusters.
I found a Python solution (using OpenCV) here.
For supporting smaller fonts, I added the following heuristic:
"The largest size of the demarcation cluster is 1/500 of the largest letter cluster".
The heuristic can be refined, by statistical analysts (or improved by other heuristics, such as demarcation locations relative to the letters).
import numpy as np
import cv2
I = cv2.imread('Goodluck.png', cv2.IMREAD_GRAYSCALE)
J = 255 - I # Invert I
img = cv2.threshold(J, 127, 255, cv2.THRESH_BINARY)[1] # Convert to binary
# https://answers.opencv.org/question/194566/removing-noise-using-connected-components/
nlabel,labels,stats,centroids = cv2.connectedComponentsWithStats(img, connectivity=8)
labels_small = []
areas_small = []
# Find largest cluster:
max_size = np.max(stats[:, cv2.CC_STAT_AREA])
thresh_size = max_size / 500 # Set the threshold to maximum cluster size divided by 500.
for i in range(1, nlabel):
if stats[i, cv2.CC_STAT_AREA] < thresh_size:
labels_small.append(i)
areas_small.append(stats[i, cv2.CC_STAT_AREA])
mask = np.ones_like(labels, dtype=np.uint8)
for i in labels_small:
I[labels == i] = 255
cv2.imshow('I', I)
cv2.waitKey(0)
Here is a MATLAB code sample (kept threshold = 200):
clear
I = imbinarize(rgb2gray(imread('בהצלחה.png')));
figure;imshow(I);
J = ~I;
%Clustering
CC = bwconncomp(J);
%Cover all small clusters with zewros.
for i = 1:CC.NumObjects
C = CC.PixelIdxList{i}; %Cluster coordinates.
%Fill small clusters with zeros.
if numel(C) < 200
J(C) = 0;
end
end
J = ~J;
figure;imshow(J);
Result:

Automatic contrast and brightness adjustment of a color photo of a sheet of paper with OpenCV

When photographing a sheet of paper (e.g. with phone camera), I get the following result (left image) (jpg download here). The desired result (processed manually with an image editing software) is on the right:
I would like to process the original image with openCV to get a better brightness/contrast automatically (so that the background is more white).
Assumption: the image has an A4 portrait format (we don't need to perspective-warp it in this topic here), and the sheet of paper is white with possibly text/images in black or colors.
What I've tried so far:
Various adaptive thresholding methods such as Gaussian, OTSU (see OpenCV doc Image Thresholding). It usually works well with OTSU:
ret, gray = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)
but it only works for grayscale images and not directly for color images. Moreover, the output is binary (white or black), which I don't want: I prefer to keep a color non-binary image as output
Histogram equalization
applied on Y (after RGB => YUV transform)
or applied on V (after RGB => HSV transform),
as suggested by this answer (Histogram equalization not working on color image - OpenCV) or this one (OpenCV Python equalizeHist colored image):
img3 = cv2.imread(f)
img_transf = cv2.cvtColor(img3, cv2.COLOR_BGR2YUV)
img_transf[:,:,0] = cv2.equalizeHist(img_transf[:,:,0])
img4 = cv2.cvtColor(img_transf, cv2.COLOR_YUV2BGR)
cv2.imwrite('test.jpg', img4)
or with HSV:
img_transf = cv2.cvtColor(img3, cv2.COLOR_BGR2HSV)
img_transf[:,:,2] = cv2.equalizeHist(img_transf[:,:,2])
img4 = cv2.cvtColor(img_transf, cv2.COLOR_HSV2BGR)
Unfortunately, the result is quite bad since it creates awful micro contrasts locally (?):
I also tried YCbCr instead, and it was similar.
I also tried CLAHE (Contrast Limited Adaptive Histogram Equalization) with various tileGridSize from 1 to 1000:
img3 = cv2.imread(f)
img_transf = cv2.cvtColor(img3, cv2.COLOR_BGR2HSV)
clahe = cv2.createCLAHE(tileGridSize=(100,100))
img_transf[:,:,2] = clahe.apply(img_transf[:,:,2])
img4 = cv2.cvtColor(img_transf, cv2.COLOR_HSV2BGR)
cv2.imwrite('test.jpg', img4)
but the result was equally awful too.
Doing this CLAHE method with LAB color space, as suggested in the question How to apply CLAHE on RGB color images:
import cv2, numpy as np
bgr = cv2.imread('_example.jpg')
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
lab_planes = cv2.split(lab)
clahe = cv2.createCLAHE(clipLimit=2.0,tileGridSize=(100,100))
lab_planes[0] = clahe.apply(lab_planes[0])
lab = cv2.merge(lab_planes)
bgr = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
cv2.imwrite('_example111.jpg', bgr)
gave bad result too. Output image:
Do an adaptive thresholding or histogram equalization separately on each channel (R, G, B) is not an option since it would mess with the color balance, as explained here.
"Contrast strechting" method from scikit-image's tutorial on Histogram Equalization:
the image is rescaled to include all intensities that fall within the 2nd and 98th percentiles
is a little bit better, but still far from the desired result (see image on top of this question).
TL;DR: how to get an automatic brightness/contrast optimization of a color photo of a sheet of paper with OpenCV/Python? What kind of thresholding/histogram equalization/other technique could be used?

Contrast and brightness can be adjusted using alpha (α) and beta (β), respectively. These variables are often called the gain and bias parameters. The expression can be written as
OpenCV already implements this as cv2.convertScaleAbs() so we can just use this function with user defined alpha and beta values.
import cv2
image = cv2.imread('1.jpg')
alpha = 1.95 # Contrast control (1.0-3.0)
beta = 0 # Brightness control (0-100)
manual_result = cv2.convertScaleAbs(image, alpha=alpha, beta=beta)
cv2.imshow('original', image)
cv2.imshow('manual_result', manual_result)
cv2.waitKey()
But the question was
How to get an automatic brightness/contrast optimization of a color photo?
Essentially the question is how to automatically calculate alpha and beta. To do this, we can look at the histogram of the image. Automatic brightness and contrast optimization calculates alpha and beta so that the output range is [0...255]. We calculate the cumulative distribution to determine where color frequency is less than some threshold value (say 1%) and cut the right and left sides of the histogram. This gives us our minimum and maximum ranges. Here's a visualization of the histogram before (blue) and after clipping (orange). Notice how the more "interesting" sections of the image are more pronounced after clipping.
To calculate alpha, we take the minimum and maximum grayscale range after clipping and divide it from our desired output range of 255
α = 255 / (maximum_gray - minimum_gray)
To calculate beta, we plug it into the formula where g(i, j)=0 and f(i, j)=minimum_gray
g(i,j) = α * f(i,j) + β
which after solving results in this
β = -minimum_gray * α
For your image we get this
Alpha: 3.75
Beta: -311.25
You may have to adjust the clipping threshold value to refine results. Here's some example results using a 1% threshold with other images: Before -> After
Automated brightness and contrast code
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Automatic brightness and contrast optimization with optional histogram clipping
def automatic_brightness_and_contrast(image, clip_hist_percent=1):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Calculate grayscale histogram
hist = cv2.calcHist([gray],[0],None,[256],[0,256])
hist_size = len(hist)
# Calculate cumulative distribution from the histogram
accumulator = []
accumulator.append(float(hist[0]))
for index in range(1, hist_size):
accumulator.append(accumulator[index -1] + float(hist[index]))
# Locate points to clip
maximum = accumulator[-1]
clip_hist_percent *= (maximum/100.0)
clip_hist_percent /= 2.0
# Locate left cut
minimum_gray = 0
while accumulator[minimum_gray] < clip_hist_percent:
minimum_gray += 1
# Locate right cut
maximum_gray = hist_size -1
while accumulator[maximum_gray] >= (maximum - clip_hist_percent):
maximum_gray -= 1
# Calculate alpha and beta values
alpha = 255 / (maximum_gray - minimum_gray)
beta = -minimum_gray * alpha
'''
# Calculate new histogram with desired range and show histogram
new_hist = cv2.calcHist([gray],[0],None,[256],[minimum_gray,maximum_gray])
plt.plot(hist)
plt.plot(new_hist)
plt.xlim([0,256])
plt.show()
'''
auto_result = cv2.convertScaleAbs(image, alpha=alpha, beta=beta)
return (auto_result, alpha, beta)
image = cv2.imread('1.jpg')
auto_result, alpha, beta = automatic_brightness_and_contrast(image)
print('alpha', alpha)
print('beta', beta)
cv2.imshow('auto_result', auto_result)
cv2.waitKey()
Result image with this code:
Results with other images using a 1% threshold
An alternative version is to add gain and bias to an image using saturation arithmetic instead of using OpenCV's cv2.convertScaleAbs(). The built-in method does not take an absolute value, which would lead to nonsensical results (e.g., a pixel at 44 with alpha = 3 and beta = -210 becomes 78 with OpenCV, when in fact it should become 0).
import cv2
import numpy as np
# from matplotlib import pyplot as plt
def convertScale(img, alpha, beta):
"""Add bias and gain to an image with saturation arithmetics. Unlike
cv2.convertScaleAbs, it does not take an absolute value, which would lead to
nonsensical results (e.g., a pixel at 44 with alpha = 3 and beta = -210
becomes 78 with OpenCV, when in fact it should become 0).
"""
new_img = img * alpha + beta
new_img[new_img < 0] = 0
new_img[new_img > 255] = 255
return new_img.astype(np.uint8)
# Automatic brightness and contrast optimization with optional histogram clipping
def automatic_brightness_and_contrast(image, clip_hist_percent=25):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Calculate grayscale histogram
hist = cv2.calcHist([gray],[0],None,[256],[0,256])
hist_size = len(hist)
# Calculate cumulative distribution from the histogram
accumulator = []
accumulator.append(float(hist[0]))
for index in range(1, hist_size):
accumulator.append(accumulator[index -1] + float(hist[index]))
# Locate points to clip
maximum = accumulator[-1]
clip_hist_percent *= (maximum/100.0)
clip_hist_percent /= 2.0
# Locate left cut
minimum_gray = 0
while accumulator[minimum_gray] < clip_hist_percent:
minimum_gray += 1
# Locate right cut
maximum_gray = hist_size -1
while accumulator[maximum_gray] >= (maximum - clip_hist_percent):
maximum_gray -= 1
# Calculate alpha and beta values
alpha = 255 / (maximum_gray - minimum_gray)
beta = -minimum_gray * alpha
'''
# Calculate new histogram with desired range and show histogram
new_hist = cv2.calcHist([gray],[0],None,[256],[minimum_gray,maximum_gray])
plt.plot(hist)
plt.plot(new_hist)
plt.xlim([0,256])
plt.show()
'''
auto_result = convertScale(image, alpha=alpha, beta=beta)
return (auto_result, alpha, beta)
image = cv2.imread('1.jpg')
auto_result, alpha, beta = automatic_brightness_and_contrast(image)
print('alpha', alpha)
print('beta', beta)
cv2.imshow('auto_result', auto_result)
cv2.imwrite('auto_result.png', auto_result)
cv2.imshow('image', image)
cv2.waitKey()

Robust Locally-Adaptive Soft Binarization! That's what I call it.
I've done similar stuffs before, for a bit different purpose, so this may not perfectly fit for your needs, but hope it helps (also I wrote this code at night for personal use so it's ugly). In a sense, this code was intended to solve a more general case compared to yours, where we can have a lot of structured noise on the background (see demo below).
What this code does? Given a photo of a sheet of paper, it will whiten it so that it can be perfectly printable. See example images below.
Teaser: that's how your pages will look like after this algorithm (before and after). Notice that even the color marker annotations are gone, so I don't know if this will fit your use case but the code might be useful:
To get a perfectly clean results, you might need to toy around with filtering parameters a bit, but as you can see, even with default parameters it works quite well.
Step 0: Cut the images to fit closely to the page
Let's asume you somehow did this step (it seems like that in the examples you provided). If you need a manual annotate-and-rewarp tool, just pm me! ^^ The results of this step is below (the examples I use here are arguably harder than the one you provided, whilst it may not exactly match your case):
From this we can immediately see the following problems:
Lightening condition is not even. This means all simple binarization methods won't work. I tried a lot of solutions available in OpenCV, as well as their combinations, none of them worked!
A lot of background noise. In my case, I needed to remove the grid of the paper, and also the ink from the other side of the paper that is visible through the thin sheet.
Step 1: Gamma correction
The reasoning of this step is to balance out the contrast of the whole image (since your image can be slightly overexposed/underexposed depending to the lighting condition).
This may seem at first as an unnecessary step, but the importance of it cannot be underestimated: in a sense, it normalizes the images to the similar distributions of exposures, so that you can choose meaningful hyper-parameters later (e.g. the DELTA parameter in next section, the noise filtering parameters, parameters for morphological stuffs, etc.)
# Somehow I found the value of `gamma=1.2` to be the best in my case
def adjust_gamma(image, gamma=1.2):
# build a lookup table mapping the pixel values [0, 255] to
# their adjusted gamma values
invGamma = 1.0 / gamma
table = np.array([((i / 255.0) ** invGamma) * 255
for i in np.arange(0, 256)]).astype("uint8")
# apply gamma correction using the lookup table
return cv2.LUT(image, table)
Here are results of gamma adjusting:
You can see that it is a bit more... "balanced" now. Without this step, all parameters that you will pick by hand in later steps will become less robust!
Step 2: Adaptive Binarization to Detect the Text Blobs
In this step, we will adaptively binarize out the text blobs. I will add more comments later, but the idea basically is following:
We divide the image into blocks of size BLOCK_SIZE. The trick is to choose its size large enough so that you still get a large chunk of text and background (i.e. larger than any symbols that you have), but small enough to not suffer from any lightening condition variations (i.e. "large, but still local").
Inside each block, we do locally-adaptive binarization: we look at the median value and hypothesize that it is the background (because we chose the BLOCK_SIZE large enough to have the majority of it to be background). Then, we further define DELTA — basically just a threshold of "how far away from median we will still consider it as background?".
So, the function process_image gets the job done. Moreover, you can modify the preprocess and postprocess functions to fit your need (however, as you can see from the example above, the algorithm is pretty robust, i.e. it works quite well out-of-the-box without modifying too much the parameters).
The code of this part assumes the foreground to be darker than the background (i.e. ink on paper). But you can easily change that by tweaking the preprocess function: instead of 255 - image, return just image.
# These are probably the only important parameters in the
# whole pipeline (steps 0 through 3).
BLOCK_SIZE = 40
DELTA = 25
# Do the necessary noise cleaning and other stuffs.
# I just do a simple blurring here but you can optionally
# add more stuffs.
def preprocess(image):
image = cv2.medianBlur(image, 3)
return 255 - image
# Again, this step is fully optional and you can even keep
# the body empty. I just did some opening. The algorithm is
# pretty robust, so this stuff won't affect much.
def postprocess(image):
kernel = np.ones((3,3), np.uint8)
image = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
return image
# Just a helper function that generates box coordinates
def get_block_index(image_shape, yx, block_size):
y = np.arange(max(0, yx[0]-block_size), min(image_shape[0], yx[0]+block_size))
x = np.arange(max(0, yx[1]-block_size), min(image_shape[1], yx[1]+block_size))
return np.meshgrid(y, x)
# Here is where the trick begins. We perform binarization from the
# median value locally (the img_in is actually a slice of the image).
# Here, following assumptions are held:
# 1. The majority of pixels in the slice is background
# 2. The median value of the intensity histogram probably
# belongs to the background. We allow a soft margin DELTA
# to account for any irregularities.
# 3. We need to keep everything other than the background.
#
# We also do simple morphological operations here. It was just
# something that I empirically found to be "useful", but I assume
# this is pretty robust across different datasets.
def adaptive_median_threshold(img_in):
med = np.median(img_in)
img_out = np.zeros_like(img_in)
img_out[img_in - med < DELTA] = 255
kernel = np.ones((3,3),np.uint8)
img_out = 255 - cv2.dilate(255 - img_out,kernel,iterations = 2)
return img_out
# This function just divides the image into local regions (blocks),
# and perform the `adaptive_mean_threshold(...)` function to each
# of the regions.
def block_image_process(image, block_size):
out_image = np.zeros_like(image)
for row in range(0, image.shape[0], block_size):
for col in range(0, image.shape[1], block_size):
idx = (row, col)
block_idx = get_block_index(image.shape, idx, block_size)
out_image[block_idx] = adaptive_median_threshold(image[block_idx])
return out_image
# This function invokes the whole pipeline of Step 2.
def process_image(img):
image_in = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
image_in = preprocess(image_in)
image_out = block_image_process(image_in, BLOCK_SIZE)
image_out = postprocess(image_out)
return image_out
The results are nice blobs like this, closely following the ink trace:
Step 3: The "Soft" Part of Binarization
Having the blobs that covers the symbols and a little bit more, we can finally do the whitening procedure.
If we look more closely at the photos of sheets of papers with text (especially those that have hand writings), the transformation from "background" (white paper) to "foreground" (the dark color ink) is not sharp, but very gradual. Other binarization-based answers in this section proposes a simple thresholding (even if they are locally-adaptive, it is still a threshold), which works okay for printed text, but will produce not-so-pretty results with hand writings.
So, the motivation of this section is that we want to preserve that effect of gradual transmission from black to white, just as natural photos of sheets of papers with natural ink. The final purpose for that is to make it printable.
The main idea is simple: the more the pixel value (after thresholding above) differs from the local min value, the more likely it is belonging to the background. We can express this using a family of Sigmoid functions, re-scaled to the range of local block (so that this function is adaptively scaled thorough the image).
# This is the function used for composing
def sigmoid(x, orig, rad):
k = np.exp((x - orig) * 5 / rad)
return k / (k + 1.)
# Here, we combine the local blocks. A bit lengthy, so please
# follow the local comments.
def combine_block(img_in, mask):
# First, we pre-fill the masked region of img_out to white
# (i.e. background). The mask is retrieved from previous section.
img_out = np.zeros_like(img_in)
img_out[mask == 255] = 255
fimg_in = img_in.astype(np.float32)
# Then, we store the foreground (letters written with ink)
# in the `idx` array. If there are none (i.e. just background),
# we move on to the next block.
idx = np.where(mask == 0)
if idx[0].shape[0] == 0:
img_out[idx] = img_in[idx]
return img_out
# We find the intensity range of our pixels in this local part
# and clip the image block to that range, locally.
lo = fimg_in[idx].min()
hi = fimg_in[idx].max()
v = fimg_in[idx] - lo
r = hi - lo
# Now we use good old OTSU binarization to get a rough estimation
# of foreground and background regions.
img_in_idx = img_in[idx]
ret3,th3 = cv2.threshold(img_in[idx],0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Then we normalize the stuffs and apply sigmoid to gradually
# combine the stuffs.
bound_value = np.min(img_in_idx[th3[:, 0] == 255])
bound_value = (bound_value - lo) / (r + 1e-5)
f = (v / (r + 1e-5))
f = sigmoid(f, bound_value + 0.05, 0.2)
# Finally, we re-normalize the result to the range [0..255]
img_out[idx] = (255. * f).astype(np.uint8)
return img_out
# We do the combination routine on local blocks, so that the scaling
# parameters of Sigmoid function can be adjusted to local setting
def combine_block_image_process(image, mask, block_size):
out_image = np.zeros_like(image)
for row in range(0, image.shape[0], block_size):
for col in range(0, image.shape[1], block_size):
idx = (row, col)
block_idx = get_block_index(image.shape, idx, block_size)
out_image[block_idx] = combine_block(
image[block_idx], mask[block_idx])
return out_image
# Postprocessing (should be robust even without it, but I recommend
# you to play around a bit and find what works best for your data.
# I just left it blank.
def combine_postprocess(image):
return image
# The main function of this section. Executes the whole pipeline.
def combine_process(img, mask):
image_in = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
image_out = combine_block_image_process(image_in, mask, 20)
image_out = combine_postprocess(image_out)
return image_out
Some stuffs are commented since they are optional. The combine_process function takes the mask from the previous step, and executes the whole composition pipeline. You can try to toy with them for your specific data (images). The results are neat:
Probably I will add more comments and explanations to the code in this answer. Will upload the whole thing (together with cropping and warping code) on Github.

This method should work well for your application. First you find a threshold value that separates the distribution modes well in the intensity histogram then rescale the intensity using that value.
from skimage.filters import threshold_yen
from skimage.exposure import rescale_intensity
from skimage.io import imread, imsave
img = imread('mY7ep.jpg')
yen_threshold = threshold_yen(img)
bright = rescale_intensity(img, (0, yen_threshold), (0, 255))
imsave('out.jpg', bright)
I'm here using Yen's method, can learn more about this method on this page.

I think the way to do that is 1) Extract the chroma (saturation) channel from HCL colorspace. (HCL works better than HSL or HSV). Only colors should have non-zero saturation, so bright, and gray shades will be dark. 2) Threshold that result using otsu thresholding to use as a mask. 3) Convert your input to grayscale and apply local area (i.e., adaptive) thresholding. 4) put the mask into the alpha channel of the original and then composite the local area thresholded result with the original, so that it keeps the colored area from the original and everywhere else uses the local area thresholded result.
Sorry, I do not know OpeCV that well, but here are the steps using ImageMagick.
Note that channels are numbered starting with 0. (H=0 or red, C=1 or green, L=2 or blue)
Input:
magick image.jpg -colorspace HCL -channel 1 -separate +channel tmp1.png
magick tmp1.png -auto-threshold otsu tmp2.png
magick image.jpg -colorspace gray -negate -lat 20x20+10% -negate tmp3.png
magick tmp3.png \( image.jpg tmp2.png -alpha off -compose copy_opacity -composite \) -compose over -composite result.png
ADDITION:
Here is Python Wand code, which produces the same output result. It needs Imagemagick 7 and Wand 0.5.5.
#!/bin/python3.7
from wand.image import Image
from wand.display import display
from wand.version import QUANTUM_RANGE
with Image(filename='text.jpg') as img:
with img.clone() as copied:
with img.clone() as hcl:
hcl.transform_colorspace('hcl')
with hcl.channel_images['green'] as mask:
mask.auto_threshold(method='otsu')
copied.composite(mask, left=0, top=0, operator='copy_alpha')
img.transform_colorspace('gray')
img.negate()
img.adaptive_threshold(width=20, height=20, offset=0.1*QUANTUM_RANGE)
img.negate()
img.composite(copied, left=0, top=0, operator='over')
img.save(filename='text_process.jpg')

First we separate text and color markings. This can be done in a color space with a color saturation channel. I used instead a very simple method inspired by this paper: the ration of min(R,G,B)/ max(R,G,B) will be near 1 for (light) gray areas and << 1 for colored areas. For dark gray areas we get anything between 0 and 1, but this doesn't matter: either these areas go to the color mask and are then added as is or they are not included in the mask and are contributed to the output from the binarized text. For black we use the fact that 0/0 becomes 0 when converted to uint8.
The grayscale image text gets locally thresholded to produce a black and white image. You can pick your favorite technique from this comparison or that survey. I chose the NICK technique that copes well with low contrast and is rather robust, i.e. the choice of the parameter k between about -0.3 and -0.1 works well for a very wide range of conditions which is good for automatic processing. For the sample document provided the chosen technique doesn't play a big role as it is relatively uniformly illuminated, but in order to cope with non-uniformly illuminated images it should be a local thresholding technique.
In the final step, the color areas are added back to the binarized text image.
So this solution is very similar to #fmw42's solution (all credit for the idea to him) with the exception of the different color detection and binarization methods.
image = cv2.imread('mY7ep.jpg')
# make mask and inverted mask for colored areas
b,g,r = cv2.split(cv2.blur(image,(5,5)))
np.seterr(divide='ignore', invalid='ignore') # 0/0 --> 0
m = (np.fmin(np.fmin(b, g), r) / np.fmax(np.fmax(b, g), r)) * 255
_,mask_inv = cv2.threshold(np.uint8(m), 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
mask = cv2.bitwise_not(mask_inv)
# local thresholding of grayscale image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
text = cv2.ximgproc.niBlackThreshold(gray, 255, cv2.THRESH_BINARY, 41, -0.1, binarizationMethod=cv2.ximgproc.BINARIZATION_NICK)
# create background (text) and foreground (color markings)
bg = cv2.bitwise_and(text, text, mask = mask_inv)
fg = cv2.bitwise_and(image, image, mask = mask)
out = cv2.add(cv2.cvtColor(bg, cv2.COLOR_GRAY2BGR), fg)
If you don't need the color markings, you can simply binarize the grayscale image:
image = cv2.imread('mY7ep.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
text = cv2.ximgproc.niBlackThreshold(gray, 255, cv2.THRESH_BINARY, at_bs, -0.3, binarizationMethod=cv2.ximgproc.BINARIZATION_NICK)

This is a C# transpilation(performed via https://github.com/uxmal/pytocs) for nathancy's answer for Emgu.CV wrapper library:
/// <summary>
/// <see>https://stackoverflow.com/questions/56905592/automatic-contrast-and-brightness-adjustment-of-a-color-photo-of-a-sheet-of-pape/75455163#75455163</see>
/// </summary>
public static (Mat autoResult, int alpha, int beta) AutomaticBrightnessAndContrast(Mat image, double clipHistPercent = 1)
{
var gray = new Mat();
CvInvoke.CvtColor(image, gray, ColorConversion.Bgr2Gray);
// Calculate grayscale histogram
var hist = new Mat();
var grayVector = new VectorOfMat(gray);
CvInvoke.CalcHist(grayVector, new[] {0}, null, hist, new[] {256}, new[] {0f, 256}, false);
var histSize = hist.Rows;
// Calculate cumulative distribution from the histogram
var accumulator = new List<float> {hist.Get<float>(0, 0)};
foreach (var index in Enumerable.Range(1, histSize - 1))
accumulator.Add(accumulator[index - 1] + hist.Get<float>(index, 0));
// Locate points to clip
var maximum = accumulator[255];
clipHistPercent *= maximum / 100.0;
clipHistPercent /= 2.0;
// Locate left cut
var minimumGray = 0;
while (accumulator[minimumGray] < clipHistPercent)
minimumGray += 1;
// Locate right cut
var maximumGray = histSize - 1;
while (accumulator[maximumGray] >= maximum - clipHistPercent)
maximumGray -= 1;
// Calculate alpha and beta values
var alpha = 255 / (maximumGray - minimumGray);
var beta = -minimumGray * alpha;
var autoResult = new Mat();
CvInvoke.ConvertScaleAbs(image, autoResult, alpha, beta);
return (autoResult, alpha, beta);
}
public static class MatExtension
{
/// <summary>
/// <see>https://stackoverflow.com/questions/32255440/how-can-i-get-and-set-pixel-values-of-an-emgucv-mat-image/69537504#69537504</see>
/// </summary>
public static unsafe T Get<T>(this Mat mat, int row, int col) =>
new ReadOnlySpan<T>(mat.DataPointer.ToPointer(), mat.Rows * mat.Cols * mat.ElementSize)
[(row * mat.Cols) + col];
}
If you are using OpenCvSharp, just modify all invokes to OpenCV with updated parameters like Rotate an image without cropping in OpenCV in C++
Also note that OpenCvSharp already has Mat.Set<> method that functions same as mat.at<> in the original OpenCV, so we don't have to copy these methods from How can I get and set pixel values of an EmguCV Mat image?

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.

Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()

Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.

You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.

Opencv: How can I get the eye color

I am using dblib to get the eyes of a face. Below are some examples of the results.
I have tried several methods to accomplish the objective. For instance, I tried to detect the center of the eye based on this project; from that, it would be easy to detect the pupil and the iris, however, I did not achieve good results. I also have tried to use Hough Circles but in some cases the results are quite bad.
My best bet is to detect the pupil, which is the only part of the eye with a common color (black) for every eye. I would like to get some ideas to do so.
My first idea is to set a region (between 20 and 60 in the x axis), then, in gray-scale, make the dark pixels (less than 25, for instance) black, and the rest, white. That would create a mask, that can be blurred to use Hough Circles and detect the region of the pupil. Finally, I can set a radius for the iris.
Any idea would be appreciated.
Thanks.

Actually your idea of detecting the shape of the pupil is good but your pictures are not good enough to do it directly. An easy way is to pre-process those to remove all useless data.
I did some example with one of your original pics to show you (on Gimp)
Go to grey scale
Do a high pass filter to remove all small color fluctuations (you have very distinct colors so it should enhance borders very well)
Link to example filtered pic
Apply a threshold on your picture to remove remaining fluctuations (you can calculate the reference threshold value by analyzing your grey scale image color histogram)
Link to example thresholded pic
After those three steps you should have enough data to run your shape detection.

Most of the answers I have read till now say to use the Hough circle method to detect the iris region, but it doesn't really work on all images.
So my approach is pretty simple, which involves following steps
Detect face from the image
Find eye region from the face
Get the RGB values just below the pupil region(thereby getting the iris region RGB values)
And pass the obtained RGB values to find_color function
NOTE: Pass High-resolution image as the input for better results. If you pass low-resolution images such as 480x620, 320x240, you might end up getting poor results.
Below is the code for the same
import cv2
import imutils
from imutils import face_utils
import dlib
import numpy as np
import webcolors
flag=0
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
img= cv2.imread('blue2.jpg')
img_rgb= cv2.cvtColor(img,cv2.COLOR_BGR2RGB) #convert to RGB
#cap = cv2.VideoCapture(0) #turns on the webcam
(left_Start, left_End) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
#points for left eye and right eye
(right_Start, right_End) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
def find_color(requested_colour): #finds the color name from RGB values
min_colours = {}
for name, key in webcolors.CSS3_HEX_TO_NAMES.items():
r_c, g_c, b_c = webcolors.hex_to_rgb(name)
rd = (r_c - requested_colour[0]) ** 2
gd = (g_c - requested_colour[1]) ** 2
bd = (b_c - requested_colour[2]) ** 2
min_colours[(rd + gd + bd)] = key
closest_name = min_colours[min(min_colours.keys())]
return closest_name
#ret, frame=cap.read()
#frame = cv2.flip(frame, 1)
#cv2.imshow(winname='face',mat=frame)
gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
# detect dlib face rectangles in the grayscale frame
dlib_faces = detector(gray, 0)
for face in dlib_faces:
eyes = [] # store 2 eyes
# convert dlib rect to a bounding box
(x,y,w,h) = face_utils.rect_to_bb(face)
cv2.rectangle(img_rgb,(x,y),(x+w,y+h),(255,0,0),1) #draws blue box over face
shape = predictor(gray, face)
shape = face_utils.shape_to_np(shape)
leftEye = shape[left_Start:left_End]
# indexes for left eye key points
rightEye = shape[right_Start:right_End]
eyes.append(leftEye) # wrap in a list
eyes.append(rightEye)
for index, eye in enumerate(eyes):
flag+=1
left_side_eye = eye[0] # left edge of eye
right_side_eye = eye[3] # right edge of eye
top_side_eye = eye[1] # top side of eye
bottom_side_eye = eye[4] # bottom side of eye
# calculate height and width of dlib eye keypoints
eye_width = right_side_eye[0] - left_side_eye[0]
eye_height = bottom_side_eye[1] - top_side_eye[1]
# create bounding box with buffer around keypoints
eye_x1 = int(left_side_eye[0] - 0 * eye_width)
eye_x2 = int(right_side_eye[0] + 0 * eye_width)
eye_y1 = int(top_side_eye[1] - 1 * eye_height)
eye_y2 = int(bottom_side_eye[1] + 0.75 * eye_height)
# draw bounding box around eye roi
#cv2.rectangle(img_rgb,(eye_x1, eye_y1), (eye_x2, eye_y2),(0,255,0),2)
roi_eye = img_rgb[eye_y1:eye_y2 ,eye_x1:eye_x2] # desired EYE Region(RGB)
if flag==1:
break
x=roi_eye.shape
row=x[0]
col=x[1]
# this is the main part,
# where you pick RGB values from the area just below pupil
array1=roi_eye[row//2:(row//2)+1,int((col//3)+3):int((col//3))+6]
array1=array1[0][2]
array1=tuple(array1) #store it in tuple and pass this tuple to "find_color" Funtion
print(find_color(array1))
cv2.imshow("frame",roi_eye)
cv2.waitKey(0)
cv2.destroyAllWindows()
Below are some examples.
An actress with blue eyes
Now this is the output of our code when the above image is given as the input: lightsteelblue
An actress with brown eyes
The output of our code when the above image is given as the input: saddlebrown
Mila kunis (one brown eye and other is green)
The output of our code when the above image is given as the input: sienna(shade of brown)
An actress with grey eyes
The output of our code when the above image is given as the input: darkgrey
So, you can see how close the results are to the actual eye color. This works pretty well with high-resolution images as I already mentioned.
PS: Correct me if am wrong, open to suggestions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to join 3 pictures - python

Related

Removing the underline of a URL in the image of a text message screenshot with python opencv and matplotlib

remove demarcation from text image - image processing

Automatic contrast and brightness adjustment of a color photo of a sheet of paper with OpenCV

Detect if an OCR text image is upside down

Opencv: How can I get the eye color

Categories

Resources