The code I've produce to detect and correct skew is giving me inconsistent results. I'm currently working on a project which utilizes OCR text extraction on images (via Python and OpenCV), so removing skew is key if accurate results are desired. My code uses cv2.minAreaRect to detect skew.
The images I'm using are all identical (and will be in the future) so I'm unsure as to what is causing these inconsistencies. I've included two sets of before and after images (including the skew value from cv2.minAreaRect) where I applied my code, one showing successul removal of skew and showing skew was not removed (looks like it added even more skew).
Image 1 Before (-87.88721466064453)
Image 1 After (successful deskew)
Image 2 Before (-5.766754150390625)
Image 2 After (unsuccessful deskew)
My code is below. Note: I've worked with many more images than those I've included here. The detected skew thus far has always been in the ranges [-10, 0) or (-90, -80], so I attempted to account for this in my code.
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray = cv2.bitwise_not(img_gray)
thresh = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
coords = np.column_stack(np.where(thresh > 0))
angle = cv2.minAreaRect(coords)[-1]
if (angle < 0 and angle >= -10):
angle = -angle #this was intended to undo skew for values in [-10, 0) by simply rotating using the opposite sign
else:
angle = (90 + angle)/2
(h, w) = img.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
deskewed = cv2.warpAffine(img, M, (w, h), flags = cv2.INTER_CUBIC, borderMode = cv2.BORDER_REPLICATE)
I've looked through various posts and articles to find an adequate solution, but have been unsuccessful. This post was the most helpful in understanding the skew values, but even then I couldn't get very far.
A very good text deskew tool can be found in Python Wand, which uses ImageMagick. It is based upon the Radon transform.
Form 1:
Form 2:
from wand.image import Image
from wand.display import display
with Image(filename='form1.png') as img:
img.deskew(0.4*img.quantum_range)
img.save(filename='form1_deskew.png')
display(img)
with Image(filename='form2.png') as img:
img.deskew(0.4*img.quantum_range)
img.save(filename='form2_deskew.png')
display(img)
Form 1 deskewed:
Form 2 deskewed:
I already answered this here: How to deskew a scanned text page with ImageMagick?
Following is the piece of code that can help you deskew the image:
import numpy as np
from skimage import io
from skimage.transform import rotate
from skimage.color import rgb2gray
from deskew import determine_skew
from matplotlib import pyplot as plt
def deskew(_img):
image = io.imread(_img)
grayscale = rgb2gray(image)
angle = determine_skew(grayscale)
rotated = rotate(image, angle, resize=True) * 255
return rotated.astype(np.uint8)
def display_before_after(_original):
plt.subplot(1, 2, 1)
plt.imshow(io.imread(_original))
plt.subplot(1, 2, 2)
plt.imshow(deskew(_original))
display_before_after('img_35h.jpg')
Before:
After:
Reference and Source: http://aishelf.org/deskew/
Related
I am trying to detect as many circles in my images using the following code:
maxRadius = int(1.2*(width/16)/2)
minRadius = int(0.9*(width/16)/2)
gray = cv.cvtColor(image, cv.COLOR_BGR2GRAY)
circles = cv.HoughCircles(image=gray,
method=cv.HOUGH_GRADIENT,
dp=1.2,
minDist=2*minRadius,
param1=70,
param2=0.9,
minRadius=minRadius,
maxRadius=maxRadius
)
Although it does work for some of the images there are a few exceptions for which it doesn't.
Below we can see that for two different images that represent the same kind of experiment, my algorithm yields very different results.
How can I fix this? Should I apply some sort of filter on the images first to enhance the contrast?
EDIT: added original image:
enter image description here
This solution may or may not work on other images but it does work on the one you posted. You might want to work on that "sweet spot" apropos the adaptiveThreshold and HoughCricles parameters so that it works with other images as well.
import numpy as np
import cv2
import matplotlib.pyplot as plt
rgb = cv2.imread('/path/to/your/image/cells_0001.jpeg')
gray = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY)
imh, imw = gray.shape
th = cv2.adaptiveThreshold(gray,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,11,2)
maxRadius = int(1.2*(imw/16)/2)
minRadius = int(0.9*(imw/16)/2)
circles = cv2.HoughCircles(image=th,
method=cv2.HOUGH_GRADIENT,
dp=1.2,
minDist=2*minRadius,
param1=70,
param2=25,
minRadius=minRadius,
maxRadius=maxRadius
)
out_img = rgb.copy()
for (x, y, r) in circles[0]:
# draw the circle in the output image
cv2.circle(out_img, (x, y), int(r), (0, 255, 0), 1)
plt.imshow(out_img)
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.
I have an image, that I want to process. I'm using Opencv and skimage. My goal is to find the distribution of the red dots around the barycenter of all the dots. I proceed as follows : first I select the color, and then I binarize the image that I obtain. Eventually, I would just count the red pixel that are on the rings with a certain width around that barycenter, in order to have an average distribution with regards to the radius supposing a cylindrical symmetry.
My issue is that I have no idea how to find the position of the barycenter.
I would also like to know if there is an short way to count the red pixels in the rings.
Here is my code :
import cv2
import matplotlib.pyplot as plt
from skimage import io, filters, measure, color, external
I'm uploading the image :
sph = cv2.imread('image_sper.jpg')
sph = cv2.cvtColor(sph, cv2.COLOR_BGR2RGB)
plt.imshow(sph)
plt.show()
I want to select the red color. Following https://realpython.com/python-opencv-color-spaces/, I'm converting it in HSV, and I'm using a mask.
hsv_sph = cv2.cvtColor(sph, cv2.COLOR_RGB2HSV)
light_red = (1, 100, 100)
dark_red = (18, 255, 255)
mask = cv2.inRange(hsv_sph, light_red, dark_red)
result = cv2.bitwise_and(sph, sph, mask=mask)
And here is the result :
plt.imshow(result)
plt.show()
Now I'm binarizing the image, since it'll be easier to process it afterwards.
red_image = result[:,:,1]
red_th = filters.threshold_otsu(red_image)
red_mask = red_image > red_th;
red_mask.dtype ;
io.imshow(red_mask);
And here we are :
What I would like some help now to find the barycenter of the white pixels.
Thx
Edit : The binarization gives the image boolean values False/True for the pixels. I don't know how to transform them to 0/1 pixels. If False was 0 and True 1, a code to find the barycenter would be :
np.shape(red_mask)
(* (321L, 316L) *)
bari=0
barj=0
N=0
for i in range(321):
for j in range(316):
bari=bari+red_mask[i,j]*i
barj=barj+red_mask[i,j]*j
N=N+red_mask[i,j]
bari=bari/N
barj=barj/N
Another question that should have been asked here: http://answers.opencv.org/questions/
But, let's go!
The process that I have implemented uses mostly structural analysis (https://docs.opencv.org/3.3.1/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a)
First I got your image:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from skimage import io, filters, measure, color, external
sph = cv2.imread('points.png')
ret,thresh = cv2.threshold(sph,200,255,cv2.THRESH_BINARY)
Then eroded and converted it for noise reduction
kernel = np.ones((2,2),np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
opening = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY);
opening = cv2.convertScaleAbs(opening)
Then used "cv::findContours (InputOutputArray image, OutputArrayOfArrays contours, OutputArray hierarchy, int mode, int method, Point offset=Point())" to find all blobs.
After that, just calculate the center of each region and do a weighted average based on the contour area. This way, I got the points centroid (X:143.4202820443726 , Y:154.56471750651224).
im2, contours, hierarchy = cv2.findContours(opening, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
areas = []
centersX = []
centersY = []
for cnt in contours:
areas.append(cv2.contourArea(cnt))
M = cv2.moments(cnt)
centersX.append(int(M["m10"] / M["m00"]))
centersY.append(int(M["m01"] / M["m00"]))
full_areas = np.sum(areas)
acc_X = 0
acc_Y = 0
for i in range(len(areas)):
acc_X += centersX[i] * (areas[i]/full_areas)
acc_Y += centersY[i] * (areas[i]/full_areas)
print (acc_X, acc_Y)
cv2.circle(sph, (int(acc_X), int(acc_Y)), 5, (255, 0, 0), -1)
plt.imshow(sph)
plt.show()
I'm trying to find all the arrows on the original image using the below template image and draw a rectangle around them. I do not want to use Sift/Surf/homography/etc. Only template matching. I only want to use 1 template and not generate 360 individual 1 degree rotation templates as reference.
Template:
Original Image:
This is my code so far
import cv2
import numpy as np
import imutils
template = cv2.imread("C:\\Users\\Desktop\\All\\images\\template.png")
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
_ ,template = cv2.threshold(template,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
image = cv2.imread("C:\\Users\\Desktop\\All\\images\\original image.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_ ,image = cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
MATCH_THRESH = 4000000
for degrees in range(0, 360, 1):
rotate = imutils.rotate_bound(template,degrees)
w, h = rotate.shape[::-1]
res = cv2.matchTemplate(image, rotate, cv2.TM_SQDIFF)
loc = np.where(res < MATCH_THRESH)
for pt in zip(*loc[::-1]):
rect = cv2.rectangle(image, pt, (pt[0] + w, pt[1] + h), (0,255,0), 4)
cv2.imshow("matches",image)
cv2.imshow("rectangle",rect)
cv2.waitKey(500)
cv2.destroyAllWindows()
print('Match for deg{}, pt({}, {}), sqdiff {}'.
format(degrees,pt[0],pt[1],res[pt[1],pt[0]]))
I take both the .png's and convert them to gray and then to black and white using Otsu, which should help with the template matching.
Next I rotate my template image in 1 degree steps through 360 degrees starting at 0. "degrees" saves what degree i'm currently at and affects the rotation of my template using this function, rotate = imutils.rotate_bound(template,degrees).
Then I run cv2.matchTemplate for each degree and save the location of points higher than a certain threshold and draw a rectangle around the found match based on the rotated templates size.
This is where i'm running into an issue. I cant seem to get it to display the rectangles.I know its finding the points because its stating so. I've tried every combination of cv2.imshow. Do you guys see something I don't?
Thank you.
You are not able to see the coloured rectangle because the picture you are trying to display it in is converted to grayscale. I suggest you store a copy before you convert to grayscale and use it to display the rectangles in.
Here is a dummy code:
def radon(img):
theta = np.linspace(-90., 90., 180, endpoint=False)
sinogram = skimage.transform.radon(img, theta=theta, circle=True)
return sinogram
# end def
I need to get the sinogram this code outputs without using skimage. But I am unable to find any implementation in python. Can you provide an implementation using only OpenCV, numpy or any other light-weight libraries?
Edit: I need this to get the dominating angle of the image. I am trying to fix the tilt before character segmentation for an OCR system. Examples are given below:
On the left side are the inputs, and on the right side are the desired output.
Edit 2: If you can provide any other ways to get this output, it will help too.
Edit 3: Some sample images:
https://drive.google.com/open?id=0B2MwGW-_t275Q2Nxb3k3TGg4N1U
Well, I had a similar problem.. After spending some time googling the issue, I found a solution that worked for me. I hope it helps.
import numpy as np
import cv2
from skimage.transform import radon
filename = 'your_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2, h/2), 90 - rotation, 1)
dst = cv2.warpAffine(img, M, (w, h))
cv2.imwrite('rotated.jpg', dst)
Test:
Original image:
Rotated image: (rotation degree is -9°)
CREDITS:
Detecting rotation and line spacing of image of page of text using Radon transform
The problem is that after rotating the image, you will get some black borders. For your case, I think it will not affect the OCR processing.