How to get orientation of text from an image? - python

I'm trying to get the orientation of text from an image.
I have 8 type of images with different orientation, look at all type in the next image (I will put a link of a repository which you can get all images inputs):
I was using these lybraries to detect orientation of my text from an image.
import pytesseract as tess
from PIL import Image
my_image = Image.open(temp_image_path)
osd = tess.image_to_osd(my_image)
print(osd)
Output:
this is what i got
> Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 2.77
Script: Cyrillic
Script confidence: 2.88
however, I don't get why sometimes a vertical plan with a vertical text (type II from my image) has an output like this:
Rotate: 90 or Rotate: 270.
I used opencv and tensorflow, they helped me to get similarities but not to identify if my text has a different orientation.
This is the Repository from github:
Click Here to watch the repository with inputs

Following #stateMachine's recommendation, detecting the footer position and aspect ratio is a good idea. You can try to do so by detecting squares in the image. This should be fairly easy to do with OpenCV, see an example.
If you have some labeled images you can also try #StereoMatching idea. In this case using a very simple HOG descriptor as the image representation + a Suport Vector for the classification should do the trick. You can use OpenCV implementation of HOGD and sklearn SVC.
Let's assume you have a nice load() function for your (small) dataset, you can do something like that :
import cv2
from sklearn import svm
from sklearn.model_selection import cross_val_score
### Load the dataset
path_images, labels = load(dataset)
### HOGD Options ###
winSize = (112,112)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = 0
nlevels = 64
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
histogramNormType,L2HysThreshold,gammaCorrection,nlevels)
### Get the dataset representation
hogds = list(map(lambda p: hog.compute(cv2.imread(p)),path_images))
### Get a sense of the performance
clf = svm.SVC(class_weight='balanced')
print(cross_val_score(clf, hogds, labels, cv=5))

Related

extracting coordinates from computer vision inference

I converted this computer vision model 7.x to an ONNX type model that can be used with the open VINO toolkit. This model has good characteristics of what I am after for how it is used in other applications I have read about.
I think my question is super basic related to not understanding computer vision enough and just curious if someone can give me some tips on the computer vision basics on how to loop through the model output for "bounding boxes" to draw with opencv.
Using this on CPU with pip installed open VINO:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from openvino.runtime import Core
model_path = (
f"./yolov7.xml"
)
ie_core = Core()
def model_init(model_path):
model = ie_core.read_model(model=model_path)
compiled_model = ie_core.compile_model(model=model, device_name="CPU")
input_keys = compiled_model.input(0)
output_keys = compiled_model.output(0)
return input_keys, output_keys, compiled_model
input_key, output_keys, compiled_model = model_init(model_path)
# resize the image so it works with the model dimensions
image = cv2.resize(image, (width, height))
image = image.transpose((2,0,1))
image = image.reshape(1,3, height,width)
# Run inference on image, trying .output(1) first
boxes = compiled_model([image])[compiled_model.output(1)]
The code works....outputs an array, but what does this data contain? For some reason I thought that there could be a confidence I could filter out bad predictions as well as bounding box coordinates?
If I print(compiled_model) this outputs I think the model architecture:
<CompiledModel:
inputs[
<ConstOutput: names[input.1] shape{1,3,640,640} type: f32>
]
outputs[
<ConstOutput: names[812] shape{1,25200,85} type: f32>,
<ConstOutput: names[588] shape{1,3,80,80,85} type: f32>,
<ConstOutput: names[669] shape{1,3,40,40,85} type: f32>,
<ConstOutput: names[750] shape{1,3,20,20,85} type: f32>
]>
Does this tell me anything about the model output, like what the data would contain? or the boxes.shape:
Which returns:
(1, 3, 80, 80, 85)
for box in boxes:
print(box)
this is just numpy arrays lots of float data just curious if anyone can help me understand at a high level what I need to learn to draw bounding boxes around features inside the image.
From my replication, your code is not working with "NameError:name 'image' is not defined" error. In your output, the ConstOutput only represents port/node of your model. To ensure your model works, run your yolov7.xml file with OpenVINO Benchmark Python Tool. You should not receive any errors.
In OpenVINO samples, you may refer to Object Detection Python Demo source code to learn the OpenVINO Inference Engine API usage for creating bounding boxes and how to handle the model. Here is another example of creating bounding boxes:
For box in boxes:
#Pick a confidence factor from the last place in an array.
conf=box[-1]
If conf > threshold:
#Convert float to int and multiply corner position of each box by x and y ration.
#If the bounding box is found that the top of the image
#Position the upper box bar little lower to make it visible on the image
(x_min, y_min, x_max, y_max) = [
int (max(corner_position*ratio_y, 10)) if idx%2
else int (corner_position*ratio_x)
for idx, corner_position in enumerate(box[:-1])
#Draw a box base on the position, parameters in rectangle function are: image,start_point, end_point, color, thickness.
rgb_image = cv2.rectangle(rgb_image, (x_min,y_min), (x_max,y_max),
colors["green"], 3)

How to treat different size images in ML pipeline

I'm here asking a general question about image processing applied to a machine learning pipeline. In this post, I will refer to ML as every algorithm that is not deep learning (therefore it doesn't use a neural network).
I'm developing a classifier to catalog different clothes .png images. I have labels (for each image I know the category) so it's a supervised learning problem.
My objective is to use PCA to reduce the problem's dimensionality and then use bag of visual words to perform the classification. I'm using python for this project.
The problem is that each photo has a different size and a different ratio between width and height (therefore I can't only resize them because I wouldn't have a unique height value for each image).
My, inelegant, solution is to fix the width at 200 px and then pad a bunch of zeros rows to each image (each image is a NumPy array of maximum_h rows and each row is width long).
Here the script:
#help function to convert images in array
def get_image(image_path: str, resize=True, w=300):
"""
:param image_path: string, path of the image
:param resize: boolean, if True the image is resized. Default: True
:param w: integer, specify the width of the resized image
:return: numpy array of the greyscale version of the image
"""
try:
image = Image.open(image_path).convert("L")
if resize:
wpercent = (w/float(image.size[0]))
hsize = int((float(image.size[1])*float(wpercent)))
image = image.resize((w,hsize), Image.ANTIALIAS)
#pixel_values = np.array(image.getdata())
return image
except:
#AI19/04442.png corrupted
#AI18/02971.png corrupted
#print(image_path)
return None
def extract_images(paths:list, categories: list, w: int, maximum_h: int):
A = np.zeros([len(paths), w * maximum_h])
y = []
counter = 0
for image_path, label in tqdm(zip(paths, categories)):
im = get_image(image_path, w=w)
if im:
#adapt images to fit
h,w = np.array(im).shape
delta_h = maximum_h-h
zeros_ = np.zeros((delta_h, w), dtype=int)
im = np.concatenate((im, zeros_), axis=0)
A[counter, :] = im.reshape(1, -1)
y.append(label)
counter += 1
else:
continue
return (A,y)
The problem here is the classifier performs badly (20%) because I add a significant amount of zeros to each image that increases the dimensionality but doesn't add information.
Looking at the biggest eigenvectors of the PCA algorithm I see that a lot of information is concentrated in these "padding" area (and this confirm my impression).
Is there a better way to handle different size images in python?

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.

Uniformity of color and texture in image

I am new to the field of deep learning and have a problem in determining whether two images have uniform color and texture. For example, I have a
Master image -
Now, with respect to this image i need to determine whether the following images have uniform texture and color distributions -
image 1 -
image 2 -
image 3 -
I need to develop an algorithm which will evaluate these 3 images with the master image. The algorithm should approve the image 1 and reject image2 because of its color and image 3 because of color and texture uniformity.
My approach for the problem was directly analyzing image for texture detection. I found that Local Binary Patterns method was good among all texture recognition methods (but I am not sure). I used its skimage implementation with opencv in python and found that the method worked.
from skimage import feature
import numpy as np
import cv2
import matplotlib.pyplot as plt
class LocalBinaryPatterns:
def __init__(self, numPoints, radius):
# store the number of points and radius
self.numPoints = numPoints
self.radius = radius
def describe(self, image, eps=1e-7):
# compute the Local Binary Pattern representation
# of the image, and then use the LBP representation
# to build the histogram of patterns
lbp = feature.local_binary_pattern(image, self.numPoints,
self.radius, method="uniform")
(hist, _) = np.histogram(lbp.ravel(),
bins=np.arange(0, self.numPoints + 3),
range=(0, self.numPoints + 2))
# normalize the histogram
hist = hist.astype("float")
hist /= (hist.sum() + eps)
# return the histogram of Local Binary Patterns
return hist
desc = LocalBinaryPatterns(24, 8)
image = cv2.imread("main.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
plt.plot(hist,'b-')
plt.ylabel('Feature Vectors')
plt.show()
It detected the features and made a histogram of feature vectors. I plotted the histogram using matplotlib and clearly found that image 1 and image 2 texture features were almost similar to the master image. And image 3 texture features were not matching.
Then I started analyzing images for their color. I plotted the color histograms using opencv as -
import cv2
from matplotlib import pyplot as plt
def draw_image_histogram(image, channels, color='k'):
hist = cv2.calcHist([image], channels, None, [256], [0, 256])
plt.plot(hist, color=color)
plt.xlim([0, 256])
def show_color_histogram(image):
for i, col in enumerate(['b', 'g', 'r']):
draw_image_histogram(image, [i], color=col)
plt.show()
show_color_histogram(cv2.imread("test1.jpg"))
I found that color histogram of image 1 matched with master image. And color histograms of image 2 and 3 did not matched. In this way I figured out that image 1 was matching and image 2 and 3 were not.
But, I this is pretty simple approach and I have no idea about the false positives it will match. Moreover I don't know the approach for the problem is the best one.
I also want this to be done by a single and robust algorithm like CNN (but should not be computationally too expensive). But I have no experience with CNNs. So should I train a CNN with master images?. Please point me in the right direction. I also came across LBCNNs, can they solve the problem?. And what can be other better approaches.
Thank you so much for the help
CNN are good on capture the underlying features and distribution of data-set. But they need large(hundreds of thousands examples) to learn and extract those features, which is very expensive task. Also for high-res images, it will need more parameters to extract those features, which further demand for more data.
If you have large data-set, you can prefer CNN, which can capture tiny bit information such as these fine texture. Otherwise, these classical methods(one you have performed) also works good.
There is also method called transfer-learning, where we use pre-trained model(which trained on similar data-set) and fine tune it on our small data-set. If you can find any such model, that can be another option.

How to acquire depth map from stereo - KITTI dataset

After trying example stated in opencv documentation.
When I tried the same code on KITTI image pair I get this:
The code I am using right now looks like this, changing the parameters in StereoBM_create did not help much:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('000002_left.png',0)
imgR = cv2.imread('000002_right.png',0)
stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
#stereo = cv2.StereoBM_create(numDisparities=64, blockSize=17)
disparity = stereo.compute(imgL,imgR)
cv2.imwrite('depth_map.png', disparity)
disp_v2 = cv2.imread('depth_map.png')
disp_v2 = cv2.applyColorMap(disp_v2, cv2.COLORMAP_JET)
plt.imshow(disp_v2)
cv2.imwrite('depth_map_coloured.png', disp_v2)
plt.show()
Question is: How can I make the depth map better?
In my experience, StereoBM (OpenCV) doesn't work with KITTI images. Maybe because KITTI images are much more complex.
But I achieved to get good results using this:
https://github.com/ialhashim/DenseDepth
You should adjust the parameters of the stereo matcher in opencv.
This is a function inside a class I created. It can be seen that I adjust some parameters such as a number of disparities, min disparity etc.:
def get_stereo_map(self, image_idx):
left_RGB = self.get_left_RGB(image_idx) # left RGB image
right_RGB = self.get_right_RGB(image_idx) # right RGB image
# compute depth map from stereo
stereo = cv2.StereoBM_create()
stereo.setMinDisparity(0)
num_disparities = 16*5
stereo.setNumDisparities(num_disparities)
stereo.setBlockSize(15)
stereo.setSpeckleRange(16)
# stereo.setSpeckleWindowSize(45)
stereo_depth_map = stereo.compute(
cv2.cvtColor(np.array(left_RGB), cv2.COLOR_RGB2GRAY),
cv2.cvtColor(np.array(right_RGB), cv2.COLOR_RGB2GRAY))
# by equation + divide by 16 to get true disperities
stereo_depth_map = (self.storage.focal_pix_RGB * self.storage.baseline_m_RGB) \
/ (stereo_depth_map/16)
stereo_depth_map = DataParser.crop_redundant(stereo_depth_map)
return stereo_depth_map
For full code refer to my repo: https://github.com/janezlapajne/kitty-stereo-dataset-parser
Ground truth from lidar and stereo distance map are also included. Hope it helps anyone.

Categories

Resources