I have a function to draw rectangles on an image. The input image is always the same but after I run the function to draw and plot the image with contours if I run it again, it keeps the old contour. The original image is read at the beginning of the function every time so I don't understand why it doesn't initialize it as if no contour were drawn previously.
I'm new to python and opencv. Please let me know if you have any suggestion to clear the image before applying contours again
def get_last_col(date_headers):
n_tables = len(date_headers)
print(n_tables)
image=date_headers[0][0]
for header in date_headers:
h_d = header[1]
page_h = image.shape[0]
last_col = h_d[h_d.bloc_x==max(h_d.bloc_x)]
last_col['width'] = last_col.bloc_xw - last_col.bloc_x
min_w = int(min(last_col.width))
# X of most bottom left corner date in header
bottom_left = min( int(min(last_col['bloc_x'])-50), int(min(last_col['bloc_x'])-min_w) )
# Y of of most bottom date element
bottom_top = int(last_col.Y_DATE)-20
#X of most bottom right corner date in header
bottom_right = int(max(h_d['bloc_xw'])+min_w)-10
cv2.rectangle(image, (bottom_left,bottom_top), (bottom_right,page_h), (36,255,12), 2)
plt.figure(figsize = (30,30))
plt.imshow(image)
Examples after running the function once and then once again with a different contour
first run
second run
I figured it out by using a copy of the image (date_headers[0][0].copy())
Related
How can I crop images that looks like this and save as 3 different images?
The issue is that images are different in size and non-proportional, so I want to make a code that dynamically cuts black borders but not the black part which is inside the picture.
Here is the desired outcome:
Below is the sample code I've made which works only for one specific image.
from PIL import Image
im = Image.open(r"image.jpg")
# Setting the points for cropped image1
# im1 = im.crop((left, top, right, bottom))
im1 = im.crop((...))
im2 = im.crop((...))
im3 = im.crop((...))
im1 = im1.save(r"image1.jpg")
im2 = im2.save(r"image2.jpg")
im3 = im3.save(r"image3.jpg")
Finally I've found the solution. Here is what I did:
from PIL import Image, ImageChops
def RemoveBlackBorders(img):
bg = Image.new(img.mode, img.size, img.getpixel((0,0)))
diff = ImageChops.difference(img, bg)
diff = ImageChops.add(diff, diff, 2.0, -100)
bbox = diff.getbbox()
if bbox:
return img.crop(bbox)
# Opens a image in RGB mode
im = Image.open(r"C:\Path\Image.jpg")
# removing borders
im = RemoveBlackBorders(im)
# getting midpoint from size
width, height = im.size
mwidth = width/2
# assign shape of figure from the midpoint
#crop((x,y of top left, x, y of bottom right))
im1 = im.crop((0, 0, mwidth-135, height))
im2 = im.crop((mwidth-78, 0, mwidth+84, height))
im3 = im.crop((mwidth+135, 0, width, height))
The function to remove borders I've found from here.
Although the solution is not completely dynamic, it still solves my problem with ~90% accuracy. But I believe there should be a more universal approach for this problem.
If the areas have always the same size and the same top and bottom coordinates the following should work:
The coordinates for the crops can be retrieved by calculating the sums per rows and per columns, then analyzing them.
import cv2
import numpy as np
im = cv2.imread(image_path)
sum_of_rows = np.sum(im, axis=(1,2))
sum_of_cols = np.sum(im, axis=(0,2))
The top and bottom can be calculated by calculating the sum for each row (each sum value being calculated R+G+B, the value should be zero for black). Then looking for the first value being different form zero and the last value being different than zero. Both indicating the top and bottom.
top = np.argmax(sum_of_rows > 0)
bottom = top + np.argmax(sum_of_rows[top:]==0)
The same can be done for the sum for each column, but here checking for multiple left and right values.
When I try to make an inverse polar transformation to my image, the output is outside of the output image. There are also some weird white patterns on the top. I tried to make the output image larger but the circle is on the left side so it didn't help.
I am trying to make a line circle using warpPolar function, for that first I'm flipping the line and giving it a black area as shown on the image, then using the cv2.warpPolar function with WARP_INVERSE_MAP flag.
How can I fully draw the circle, and get its bounding box is my question.
line = np.ones(shape=(20,475),dtype=np.uint8)*255
flipped = cv2.rotate(line,cv2.ROTATE_90_CLOCKWISE)
cv2.imshow('flipped',flipped)
h,w = flipped.shape
radius = int(h / (2*np.pi))
new_image = np.zeros(shape=(h,radius+w),dtype=np.uint8)
h2,w2 = new_image.shape
new_image[: ,w2-w:w2] = flipped
cv2.imshow('polar',new_image)
h,w = new_image.shape
center = (w/2,h)
output= cv2.warpPolar(new_image,center=center,maxRadius=radius,dsize=(1500,1500),flags=cv2.WARP_INVERSE_MAP + cv2.WARP_POLAR_LINEAR)
cv2.imshow('output',output)
cv2.waitKey(0)
Note: I am not getting the same result as you showed above when I tried the same code. You may miss some code lines to add ?
If I didn't misunderstand your problem,you are trying to get this result: (If I am wrong, I will update the answer accordingly)
The only point you are missing is that defining the center and radius. You are making inverse transform here, the input is created by you not warpPolar. Since you are defining size as (1500,1500), you need to update center and radius accordingly. Here is my code giving this result:
import cv2
import numpy as np
line = np.ones(shape=(20,475),dtype=np.uint8)*255
flipped = cv2.rotate(line,cv2.ROTATE_90_CLOCKWISE)
cv2.imshow('flipped',flipped)
h,w = flipped.shape
radius = int(h / (2*np.pi))
new_image = np.zeros(shape=(h,radius+w),dtype=np.uint8)
h2,w2 = new_image.shape
new_image[: ,w2-w:w2] = flipped
cv2.imshow('polar',new_image)
h,w = new_image.shape
center = (750,750)
maxRadius = 750
output= cv2.warpPolar(new_image,center=center,maxRadius=radius,dsize=(1500,1500),flags=cv2.WARP_INVERSE_MAP + cv2.WARP_POLAR_LINEAR)
cv2.imshow('output',output)
cv2.waitKey(0)
I work at a studio that does school photos and we are trying to make a script to eliminate the job of cropping each photo to a template. The photos we work with are fairly uniform but they vary in resolution and head position a bit. I took up the mantle of trying to write the script with my fairly limited Python knowledge and through a lot of trial and error and online resources I think I have got most of the way there.
At the moment I am trying to figure out the best way to have the image crop from the NumPy array with the head where I want and I just cant find a good flexible solution. The head needs to be positioned slightly differently for pose 1 and pose 2 so its needs to be easy to change on the fly (Probably going to implement some sort of simple GUI to input stuff like that, but for now I can just change the code).
I also need to be able to change the output resolution of the photo so they are all uniform (2000x2500). Anyone have any ideas?
At the moment this is my current code, it just saves the detected face square:
import cv2
import os.path
import glob
# Cascade path
cascPath = 'haarcascade_frontalface_default.xml'
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
#Check for output folder and create if its not there
if not os.path.exists('output'):
os.makedirs('output')
# Read Images
images = glob.glob('*.jpg')
for c, i in enumerate(images):
image = cv2.imread(i, 1)
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find face(s) using cascade
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1, # size of groups
minNeighbors=5, # How many groups around are detected as face for it to be valid
minSize=(500, 500) # Min size in pixels for face
)
# Outputs number of faces found in image
print('Found {0} faces!'.format(len(faces)))
# Places a rectangle on face
for (x, y, w, h) in faces:
imgCrop = image[y:y+h,x:x+w]
if len(faces) > 0:
#Saves Images to output folder with OG name
cv2.imwrite('output/'+ i, imgCrop)
I can crop using it like this:
# Crop Padding
left = 300
right = 300
top = 400
bottom = 1000
for (x, y, w, h) in faces:
imgCrop = image[y-top:y+h+bottom, x-left:x+w+right]
but that outputs pretty random resolutions and changes based on the image resolution
TL;DR
To set a new resolution with the dimension, you can use cv2.resize. There may be a pixel loss so you can use the interpolation method.
The newly resized image may be in BGR format, so you may need to convert to RGB format.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Suggestion:
One approach is using python's face-recognition library.
The approach is using two sample images for training.
Predict the next image based on training images.
For instance, The followings are the training images:
We want to predict the faces in the below image:
When we get the facial encodings of the training images and apply to the next image:
import face_recognition
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
# Load a sample picture and learn how to recognize it.
first_image = face_recognition.load_image_file("images/ex.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
# Load a second sample picture and learn how to recognize it.
second_image = face_recognition.load_image_file("images/index.jpg")
sec_face_encoding = face_recognition.face_encodings(second_image)[0]
# Create arrays of known face encodings and their names
known_face_encodings = [
first_face_encoding,
sec_face_encoding
]
print('Learned encoding for', len(known_face_encodings), 'images.')
# Load an image with an unknown face
unknown_image = face_recognition.load_image_file("images/babes.jpg")
# Find all the faces and face encodings in the unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)
# Convert the image to a PIL-format image so that we can draw on top of it with the Pillow library
# See http://pillow.readthedocs.io/ for more about PIL/Pillow
pil_image = Image.fromarray(unknown_image)
# Create a Pillow ImageDraw Draw instance to draw with
draw = ImageDraw.Draw(pil_image)
# Loop through each face found in the unknown image
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255), width=5)
# Remove the drawing library from memory as per the Pillow docs
del draw
# Display the resulting image
plt.imshow(pil_image)
plt.show()
The output will be:
The above is my suggestion. When you create a new resolution with the current image, there will be a pixel loss. Therefore you need to use an interpolation method.
For instance: after finding the face locations, select the coordinates in the original image.
# Add after draw.rectangle function.
crop = unknown_image[top:bottom, left:right]
Set new resolution with the size 2000 x 2500 and interpolation with CV2.INTERN_LANCZOS4.
Possible Question: Why CV2.INTERN_LANCZOS4?
Of course, you can select whatever you like, but in this post CV2.INTERN_LANCZOS4 was suggested.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
Save the image
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Outputs are around 4.3 MB Therefore I can't display in here.
From the final result, we clearly see and identify faces. The library precisely finds the faces in the image.
Here what you can do:
Either you can use the training images of your own-set, or you can use the example above.
Apply the face-recognition function for each image, using the trained face-locations and save the results in the directory.
here is how I got it to crop how I wanted, this is added right below the "output number of faces" function
#Get the face postion and output values into variables, might not be needed but I did it
for (x, y, w, h) in faces:
xdis = x
ydis = y
w = w
h = h
#Get scale value by dividing wanted head hight by detected head hight
ws = 600/w
hs = 600/h
#scale image to get head to right size, uses bilinear interpolation by default
scale = cv2.resize(image,(0,0),fx=hs,fy=ws)
#calculate head postion for given values
sxdis = int(xdis*ws) #applying scale to x distance and turning it into a integer
sydis = int(ydis*hs) #applying scale to y distance and turning it into a integer
sycent = sydis+300 #adding half head hight to get center
ystart = sycent-700 #subtract where you want the head center to be in pixels, this is for the vertical
yend = ystart+2500 #Add whatever you want vertical resolution to be
xcent = sxdis+300 #adding half head hight to get center
xstart = xcent-1000 #subtract where you want the head center to be in pixels, this is for the horizontal
xend = xstart+2000 #add whatever you want the horizontal resolution to be
#Crop the image
cropped = scale[ystart:yend, xstart:xend]
Its a mess but it works exactly how I wanted it to work.
ended up going with openCV instead of switching to python-Recognition because of speed but I might switch over if I can get multithreading to work in python-recognition.
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.
I'm trying to find all the arrows on the original image using the below template image and draw a rectangle around them. I do not want to use Sift/Surf/homography/etc. Only template matching. I only want to use 1 template and not generate 360 individual 1 degree rotation templates as reference.
Template:
Original Image:
This is my code so far
import cv2
import numpy as np
import imutils
template = cv2.imread("C:\\Users\\Desktop\\All\\images\\template.png")
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
_ ,template = cv2.threshold(template,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
image = cv2.imread("C:\\Users\\Desktop\\All\\images\\original image.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_ ,image = cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
MATCH_THRESH = 4000000
for degrees in range(0, 360, 1):
rotate = imutils.rotate_bound(template,degrees)
w, h = rotate.shape[::-1]
res = cv2.matchTemplate(image, rotate, cv2.TM_SQDIFF)
loc = np.where(res < MATCH_THRESH)
for pt in zip(*loc[::-1]):
rect = cv2.rectangle(image, pt, (pt[0] + w, pt[1] + h), (0,255,0), 4)
cv2.imshow("matches",image)
cv2.imshow("rectangle",rect)
cv2.waitKey(500)
cv2.destroyAllWindows()
print('Match for deg{}, pt({}, {}), sqdiff {}'.
format(degrees,pt[0],pt[1],res[pt[1],pt[0]]))
I take both the .png's and convert them to gray and then to black and white using Otsu, which should help with the template matching.
Next I rotate my template image in 1 degree steps through 360 degrees starting at 0. "degrees" saves what degree i'm currently at and affects the rotation of my template using this function, rotate = imutils.rotate_bound(template,degrees).
Then I run cv2.matchTemplate for each degree and save the location of points higher than a certain threshold and draw a rectangle around the found match based on the rotated templates size.
This is where i'm running into an issue. I cant seem to get it to display the rectangles.I know its finding the points because its stating so. I've tried every combination of cv2.imshow. Do you guys see something I don't?
Thank you.
You are not able to see the coloured rectangle because the picture you are trying to display it in is converted to grayscale. I suggest you store a copy before you convert to grayscale and use it to display the rectangles in.