I was wondering how I would replicate what is being done in this image:
To break it down:
Get facial landmarks using dlib (green dots)
Rotate the image so that the eyes are horizontal
Find the midpoint of the face by averaging the left most and right most landmarks (blue dot) and center the image on the x-axis
Fix the position along the y-axis by by placing the eye center 45% from the top of the image, and the mouth center 25% from the image
Right now this is what I have:
I am kind of stuck on step 3, which I think can be done by an affine transform? But I'm completely stumped on step 4, I have no idea how I would achieve it.
Please tell me if you need me to provide the code!
EDIT: So after look at #Gal Dreiman's answer I was able to center the face perfectly so that the blue dot in the the center of my image.
Although when I implemented the second part of his answer I end up getting something like this:
I see that the points have been transformed to the right places, but it's not the outcome that I had desired as it is sheered quite dramatically. Any ideas?
EDIT 2:
After switching the x, y coordinates for the center points around, this is what I got:
As I see you section 3, the easiest way to do that is by:
Find the faces in the image:
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30),
flags = cv2.cv.CV_HAAR_SCALE_IMAGE
)
For each face calculate the midpoint:
for (x, y, w, h) in faces:
mid_x = x + int(w/2)
mid_y = y + int(h/2)
Affine transform the image to center the blue dot you already calculated:
height, width = img.shape
x_dot = ...
y_dot = ...
dx_dot = int(width/2) - x_dot
dy_dot = int(height/2) - y_dot
M = np.float32([[1,0,dx_dot],[0,1,dy_dot]])
dst = cv2.warpAffine(img,M,(cols,rows))
Hope it was helpful.
Edit:
Regarding section 4:
In order to stretch (resize) the image, all you have to do is to perform an affine transform. To find the transformation matrix, we need three points from input image and their corresponding locations in output image.
p_1 = [eyes_x, eye_y]
p_2 = [int(width/2),int(height/2)] # default: center of the image
p_3 = [mouth_x, mouth_y]
target_p_1 = [eyes_x, int(eye_y * 0.45)]
target_p_2 = [int(width/2),int(height/2)] # don't want to change
target_p_3 = [mouth_x, int(mouth_y * 0.75)]
pts1 = np.float32([p_1,p_2,p_3])
pts2 = np.float32([target_p_1,target_p_2,target_p_3])
M = cv2.getAffineTransform(pts1,pts2)
output = cv2.warpAffine(image,M,(height,width))
To clear things out:
eye_x / eye_y is the location of the eye center.
The same applies on mouth_x / mouth_y, for mouth center.
target_p_1/2/3 are the target points.
Edit 2:
I see you are in trouble, I hope this time my suggestion will work for you:
There is another approach I can think of. you can perform sort of "crop" to the image by pointing on 4 points, lets define them as the 4 points which wrap the face, and change the image perspective according to their new position:
up_left = [x,y]
up_right = [...]
down_left = [...]
down_right = [...]
pts1 = np.float32([up_left,up_right,down_left,down_right])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
M = cv2.getPerspectiveTransform(pts1,pts2)
dst = cv2.warpPerspective(img,M,(300,300))
So all you have to do is to define those 4 points. My suggestion, calculate the conture around the face (which you already did) and after that add delta_x and delta_y (or subtract) to the coordinates.
Related
When I try to make an inverse polar transformation to my image, the output is outside of the output image. There are also some weird white patterns on the top. I tried to make the output image larger but the circle is on the left side so it didn't help.
I am trying to make a line circle using warpPolar function, for that first I'm flipping the line and giving it a black area as shown on the image, then using the cv2.warpPolar function with WARP_INVERSE_MAP flag.
How can I fully draw the circle, and get its bounding box is my question.
line = np.ones(shape=(20,475),dtype=np.uint8)*255
flipped = cv2.rotate(line,cv2.ROTATE_90_CLOCKWISE)
cv2.imshow('flipped',flipped)
h,w = flipped.shape
radius = int(h / (2*np.pi))
new_image = np.zeros(shape=(h,radius+w),dtype=np.uint8)
h2,w2 = new_image.shape
new_image[: ,w2-w:w2] = flipped
cv2.imshow('polar',new_image)
h,w = new_image.shape
center = (w/2,h)
output= cv2.warpPolar(new_image,center=center,maxRadius=radius,dsize=(1500,1500),flags=cv2.WARP_INVERSE_MAP + cv2.WARP_POLAR_LINEAR)
cv2.imshow('output',output)
cv2.waitKey(0)
Note: I am not getting the same result as you showed above when I tried the same code. You may miss some code lines to add ?
If I didn't misunderstand your problem,you are trying to get this result: (If I am wrong, I will update the answer accordingly)
The only point you are missing is that defining the center and radius. You are making inverse transform here, the input is created by you not warpPolar. Since you are defining size as (1500,1500), you need to update center and radius accordingly. Here is my code giving this result:
import cv2
import numpy as np
line = np.ones(shape=(20,475),dtype=np.uint8)*255
flipped = cv2.rotate(line,cv2.ROTATE_90_CLOCKWISE)
cv2.imshow('flipped',flipped)
h,w = flipped.shape
radius = int(h / (2*np.pi))
new_image = np.zeros(shape=(h,radius+w),dtype=np.uint8)
h2,w2 = new_image.shape
new_image[: ,w2-w:w2] = flipped
cv2.imshow('polar',new_image)
h,w = new_image.shape
center = (750,750)
maxRadius = 750
output= cv2.warpPolar(new_image,center=center,maxRadius=radius,dsize=(1500,1500),flags=cv2.WARP_INVERSE_MAP + cv2.WARP_POLAR_LINEAR)
cv2.imshow('output',output)
cv2.waitKey(0)
Links to all images at the bottom
I have drawn a line over an arrow which captures the angle of that arrow. I would like to then remove the arrow, keep only the line, and use cv2.minAreaRect to determine the angle. So far I've got everything to work except removing the original arrow, which results in an incorrect angle generated by the cv2.minAreaRect bounding box.
Really, I just want the bold black line running through the arrow to use to measure the angle, not the arrow itself. if anyone has an idea to make this work, or a simpler way, please let me know. Thanks
Code:
import numpy as np
import cv2
image = cv2.imread("templates/a_15.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(image, 127, 255, 0)
contours, hierarchy = cv2.findContours(thresh, 1, 2)
cont = contours[0]
rows,cols = image.shape[:2]
[vx,vy,x,y] = cv2.fitLine(cont, cv2.DIST_L2,0,0.01,0.01)
leftish = int((-x*vy/vx) + y)
rightish = int(((cols-x)*vy/vx)+y)
line = cv2.line(image,(cols-1,rightish),(0,leftish),(0,255,0),10)
# thresholding
thresh = cv2.threshold(line, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# compute rotated bounding box based on all pixel values > 0 and
# use coordinates to compute a rotated bounding box of those coordinates
coordinates = np.column_stack(np.where(thresh > 0))
w = coordinates[0]
h = coordinates[1]
# Compute minimum rotated angle that contains entire image.
# Return angle values in the range [-90, 0).
# As the rectangle rotates clockwise, angle values increase towards 0.
# Once 0 is reached, angle is set back to -90 degrees.
angle = cv2.minAreaRect(coordinates)[-1]
# for angles less than -45 degrees, add 90 degrees to angle to take the inverse.
if angle < - 45:
angle = -(90 + angle)
else:
angle = -angle
# rotate image
(h, w) = image.shape[:2]
center = (w // 2, h // 2) # image center
RM = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(image, RM, (w, h),
flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
# correction angle for validation
cv2.putText(rotated, "Angle {:.2f} degrees".format(angle),
(10, 30), cv2.FONT_HERSHEY_DUPLEX, 0.9, (0, 255, 0), 2)
# output
print("[INFO] angle: {:.3f}".format(angle))
cv2.imshow("Line", line)
cv2.imshow("Input", image)
cv2.imshow("Rotated", rotated)
cv2.waitKey(0)
Images
original
current results
goal
Here's a possible solution. The main idea is to identify de "tip" and the "tail" of the arrow approximating some key points. After you have identified both ends, you can draw a line joining both points. It is also an advantage to know which of the endpoints is the tip, because that way you can measure the angle from a constant point.
There's more than one way to achieve this. I choose something that I have applied in the past: I will use this approach to identify the endpoints of the overall shape. My assumption is that the tip will yield more points than the tail. After that, I'll cluster all the endpoints in two groups: tip and tail. I can use K-Means for that, as it will return the mean centers for both clusters. After that, we have our tip and tail points that can be joined easily with a line. These are the steps:
Convert the image to grayscale
Get the skeleton of the image, to normalize the shape to a width of 1 pixel
Apply the method described in the link to get the arrow's endpoints
Divide the endpoints in two clusters and use K-Means to get their centers
Join both endpoints with a line
Let's see the code:
# imports:
import cv2
import numpy as np
# image path
path = "D://opencvImages//"
fileName = "CoXeb.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Grayscale conversion:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
grayscaleImage = 255 - grayscaleImage
# Extend the borders for the skeleton:
extendedImg = cv2.copyMakeBorder(grayscaleImage, 5, 5, 5, 5, cv2.BORDER_CONSTANT)
# Store a deep copy of the crop for results:
grayscaleImageCopy = cv2.cvtColor(extendedImg, cv2.COLOR_GRAY2BGR)
# Compute the skeleton:
skeleton = cv2.ximgproc.thinning(extendedImg, None, 1)
The first step is to get the skeleton of the arrow. As I said, this step is needed prior to the convolution-based method that identifies the endpoints of a shape. Computing the skeleton normalizes the shape to a one pixel width. However, sometimes, if the shape is too close to the "canvas" borders, the skeleton could show some artifacts. This is avoided with a border extension. The skeleton of the arrow is this:
Check that image out. If we identify the endpoints, the tip will exhibit at least 3 points, while the tail at least 1. That's handy - the tip will always have more points than the tail. If only we could detect those points... Luckily, we can:
# Threshold the image so that white pixels get a value of 0 and
# black pixels a value of 10:
_, binaryImage = cv2.threshold(skeleton, 128, 10, cv2.THRESH_BINARY)
# Set the end-points kernel:
h = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
# Convolve the image with the kernel:
imgFiltered = cv2.filter2D(binaryImage, -1, h)
# Extract only the end-points pixels, those with
# an intensity value of 110:
binaryImage = np.where(imgFiltered == 110, 255, 0)
# The above operation converted the image to 32-bit float,
# convert back to 8-bit uint
binaryImage = binaryImage.astype(np.uint8)
This endpoint detecting method convolves the skeleton with a special kernel that identifies endpoints. It returns a binary image where all the endpoints have the value 110. After thresholding this mid-result, we get this image, which represents the arrow endpoints:
Nice, as you see, we can group the points in two clusters and get their cluster centers. Sounds like a job for K-Means, because that's exactly what it does. We first need to treat our data, though, because K-Means operates on defined-shaped arrays of float data:
# Find the X, Y location of all the end-points
# pixels:
Y, X = binaryImage.nonzero()
# Reshape the arrays for K-means
Y = Y.reshape(-1,1)
X = X.reshape(-1,1)
Z = np.hstack((X, Y))
# K-means operates on 32-bit float data:
floatPoints = np.float32(Z)
# Set the convergence criteria and call K-means:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(floatPoints, 2, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Set the cluster count, find the points belonging
# to cluster 0 and cluster 1:
cluster1Count = np.count_nonzero(label)
cluster0Count = np.shape(label)[0] - cluster1Count
print("Elements of Cluster 0: "+str(cluster0Count))
print("Elements of Cluster 1: " + str(cluster1Count))
The last two lines prints the endpoints that are assigned to Cluster 0 Cluster 1, respectively. That outputs this:
Elements of Cluster 0: 3
Elements of Cluster 1: 2
Just as expected - well, kinda. Seems that Cluster 0 is the tip and cluster 2 the tail! But the tail actually got 2 points. If you look the image of the skeleton closely, you'll see there's a small bifurcation at the tail. That's why we, in reality, got two points instead of just one. Alright, let's get the center points and draw them on the original input:
# Look for the cluster of max number of points
# That cluster will be the tip of the arrow:
maxCluster = 0
if cluster1Count > cluster0Count:
maxCluster = 1
# Check out the centers of each cluster:
matRows, matCols = center.shape
# Store the ordered end-points here:
orderedPoints = [None] * 2
# Let's identify and draw the two end-points
# of the arrow:
for b in range(matRows):
# Get cluster center:
pointX = int(center[b][0])
pointY = int(center[b][1])
# Get the "tip"
if b == maxCluster:
color = (0, 0, 255)
orderedPoints[0] = (pointX, pointY)
# Get the "tail"
else:
color = (255, 0, 0)
orderedPoints[1] = (pointX, pointY)
# Draw it:
cv2.circle(grayscaleImageCopy, (pointX, pointY), 3, color, -1)
cv2.imshow("End-Points", grayscaleImageCopy)
cv2.waitKey(0)
This is the resulting image:
The tip always gets drawn in red while the tail is drawn in blue. Very cool, let's store these points in the orderedPoints list and draw the final line in a new "canvas", with dimension same as the original image:
# Store the tip and tail points:
p0x = orderedPoints[1][0]
p0y = orderedPoints[1][1]
p1x = orderedPoints[0][0]
p1y = orderedPoints[0][1]
# Create a new "canvas" (image) using the input dimensions:
imageHeight, imageWidth = binaryImage.shape[:2]
newImage = np.zeros((imageHeight, imageWidth), np.uint8)
newImage = 255 - newImage
# Draw a line using the detected points:
(x1, y1) = orderedPoints[0]
(x2, y2) = orderedPoints[1]
lineColor = (0, 0, 0)
cv2.line(newImage , (x1, y1), (x2, y2), lineColor, thickness=2)
cv2.imshow("Detected Line", newImage)
cv2.waitKey(0)
The line overlaid on the original image and the new image containing only the line:
It sounds like you want to measure the angle of the line but because you are measuring a line you drew in the original image, you must now filter out the original image to get an accurate measure of the line...which you drew with coordinates you know the endpoints of?
I guess:
make a better filter?
draw the line in a blank image and detect angle there?
determine the angle from the known coordinates?
Since you were asking for just a line, I tried that...just made a blank image, drew your detected line on it and then used that downstream...
blankIm = np.ones((height, width, channels), dtype=np.uint8)
blankIm.fill(255)
line = cv2.line(blankIm,(cols-1,rightish),(0,leftish),(0,255,0),10)
I'm trying to randomly rotate some images with annotations,
I'm trying to understand how to get the new point location after rotation,
my images have different shapes
Example of what I'm trying to do is: this function to calculate the new position according to this answer (here)
def new_pixel(x,y,theta,X,Y):
sin = math.sin(theta)
cos = math.cos(theta)
x_new = (x-X/2)*cos + (y-X/2)*sin + X/2
y_new = -(x-X/2)*sin + (y-Y/2)*cos + Y/2
return int(x_new),int(y_new)
the code of open original image:
img = cv2.imread('D://ubun/1.jpg')
print(img.shape)
X, Y, c = img.shape
p1 = (124,291)
p2 = (168,291)
p3 = (169,391)
p4 = (125,391)
img1 = img.copy()
cv2.circle(img1, p1, 10, color=(255,0,0), thickness=2)
plt.imshow(img1)
the red dot is the point
the code for rotation:
rotated = ndimage.rotate(img, 45)
print(rotated.shape)
p11 = new_pixel(p1[0],p1[1],45,X,Y)
p22 = new_pixel(p2[0],p2[1],45,X,Y)
p33 = new_pixel(p3[0],p3[1],45,X,Y)
p44 = new_pixel(p4[0],p4[1],45,X,Y)
cv2.circle(rotated, p11, 10, color=(255,0,0), thickness=2)
plt.imshow(rotated)
The image after rotation and see the point is not in the correct position after rotation:
I noticed that image shape is different after rotation, does this effect the calculations ?
This reply is really late, but even I faced the same issue. So there are basically two issues with your code.
math.sin /math.cos takes radians as input not degrees. SO just convert it to radians to get the right result.
X,Y : You are taking the height and width of the initial image which was not rotated. This would be right if the resulting image is the same size as the initial image , but since the resulting image resizes the output to fit the complete image after rotation , you need to take X , Y as the new width and height. But you need to consider the new height and width only while your adding it after transformation. SO the formula would be :
x_new = (x-old_Width/2)*cos + (y-old_height/2)*sin + new_width/2
y_new = -(x-old_Width/2)*sin + (y-old_height/2)*cos + new_height/2
I know this post is too late , but hope this helps out the people whoever faces the same issue.
I have a dataset of x-ray images that i am trying to clean by rotating the images so the arm is vertical and cropping the image of any excess space. Here are some examples from the dataset:
I am currently working out the best way to work out the angle of the x-ray and rotate the image based on that.
My curent approach is to detect the line of the side of the rectangle that the scan is in using the hough transform, and rotate the image based on that.
I tried to run the hough transform on the output of a canny edge detector but this doesnt work so well for images where the edge of the rectangle is blurred like in the first image.
I cant use cv's box detection as sometimes the rectangle around the scan has an edge off screen.
So i currently use adaptive thresholding to find the edge of the box and then median filter it and try to find the longest line in this, but sometimes the wrong line is the longest and the image gets rotated completley wrong.
Adaptive thresholding is used due to the fact that soem scans have different brightnesses.
The current implementation i have is:
def get_lines(img):
#threshold
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 15, 4.75)
median = cv2.medianBlur(thresh, 3)
# detect lines
lines = cv2.HoughLines(median, 1, np.pi/180, 175)
return sorted(lines, key=lambda x: x[0][0], reverse=True)
def rotate(image, angle):
(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
return cv2.warpAffine(image, M, (nW, nH))
def fix_rotation(input):
lines = get_lines(input)
rho, theta = lines[0][0]
return rotate_bound(input, theta*180/np.pi)
and produces the following results:
When it goes wrong:
I was wondering if there are any better techniques to usein order to improve the performance of this and what the best way to go about cropping the images after they have been rotated would be?
The idea is to use the blob of the arm itself and fit an ellipse around it. Then, extract its major axis. I quickly tested the idea in Matlab – not OpenCV. Here's what I did, you should be able to use OpenCV's equivalent functions to achieve similar outputs.
First, compute the threshold value of your input via Otsu. Then add some bias to the threshold value to find a better segmentation and use this value to threshold the image.
In pseudo-code:
//the bias value
threshBias = 0.4;
//get the binary threshold via otsu:
thresholdLevel = graythresh( grayInput, “otsu” );
//add bias to the original value
thresholdLevel = thresholdLevel - threshSensitivity * thresholdLevel;
//get the fixed binary image:
thresholdLevel = imbinarize( grayInput, thresholdLevel );
After small blob filtering, this is the output:
Now, get the contours/blobs and fit an ellipse for each contour. Check out the OpenCV example here: https://docs.opencv.org/3.4.9/de/d62/tutorial_bounding_rotated_ellipses.html
You end up with two ellipses:
We are looking for the biggest ellipse, the one with the biggest area and the biggest major and minor axis. I used the width and height of each ellipse to filter the results. The target ellipse is then colored in green. Finally, I get the major axis of the target ellipse, here colored in yellow:
Now, to implement these ideas in OpenCV you have these options:
Use fitEllipse to find the ellipses. The return value of this
function is a RotatedRect object. The data stored here are the
vertices of the ellipse.
Instead of fitting an ellipse you could try using minAreaRect, which
finds a rotated rectangle of the minimum area enclosing a blob.
You can use image moments to calculate the rotation angle.
Using opencv moments function, calculate the second order central moments to construct a covariance matrix and then obtain the orientation as shown here in the Image moment wiki page.
Obtain the normalized central moments nu20, nu11 and nu02 from opencv moments. Then the orientation is calculated as
0.5 * arctan(2 * nu11/(nu20 - nu02))
Please refer the given link for details.
You can use the raw image itself or the preprocessed one for the calculation of orientation. See which one gives you better accuracy and use it.
As for the bounding-box, once you rotate the image, assuming you used the preprocessed one, get all the non-zero pixel coordinates of the rotated image and calculate their upright bounding-box using opencv boundingRect.
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.