Tesseract OCR extraction

Tesseract OCR extraction - python

I am building an OCR model where I have performed object detection on the images. I am calling the detection function to detect bounding boxes. I am cropping the images basis bounding boxes. The challenge I am facing is the cropped images are too small for tesseract for data extraction and it is impacting the accuracy quality.
# Crop Image
cropped_image = tf.image.crop_to_bounding_box(image, y_min, x_min, y_max - y_min, x_max - x_min)
# write jpg with pillow
img_pil = Image.fromarray(cropped_image.numpy())
score = bscores[idx] * 100
file_name = OUTPUT_PATH + "somefilename"
img_pil = ImageOps.grayscale(img_pil)
img_pil.save(file_name, quality=95, subsampling=0)
I am running super resolution algorithm over the cropped images to improve the image quality before passing to tesseract, however still not able to achieve good accuracy.
# Create an SR object
sr = dnn_superres.DnnSuperResImpl_create()
# Define model path
model_path = os.path.join(base_path, model + ".pb")
# Extract model name, get the text between '/' and '_'
model_name = model_path.split('\\')[-1].split('_')[0].lower()
# Extract model scale
model_scale = int(model_path.split('\\')[-1].split('_')[1].split('.')[0][1])
# Read the desired model
sr.readModel(model_path)
sr.setModel(model_name, model_scale)
How to fix these cropped images issue so that data extraction is more accurate.

Have you tried OCRing and then cropping, rather than the reverse? It may take longer but it is likely going to be more accurate.
I have a lot of experience using ocrmypdf with PDFPlumber and Regex to parse PDF documents into spreadsheets and this is the process I generally follow:
import pandas as pd
import os
import pdfplumber
import re
#OCR PDF
os.system('ocrmypdf --force-ocr --deskew path/to/file.pdf path/to/file.pdf')
text = ''
with pdfplumber.open('path/to/file.pdf'):
for i in range(0, len(pages)):
page = pdf.pages[i]
text = page.extract_text()
pdf_text = pdf_text + '\n' + text
ids = re.findall('id: (.*)', text)
y = pdf_text.split('\n')
ds = []
for i,j in enumerate(ids):
d = {}
try:
id1 = ids[i]
idx1 = [idx for idx, s in enumerate(y) if id1 in s][0]
try:
id2 = ids[i+1]
idx2 = [idx for idx, s in enumerate(y) if id2 in s][0]
z = y[idx1:idx2]
except:
z = y[idx1:]
except:
pass
chunk = ''
#may need to add if/else or try/except
d['value'] = re.findall('Model name: (.*)', chunk)[0]
#rinse and repeat
ds.append(d)
df = pd.DataFrame(ds)
Not sure how helpful that will be, but it may give you some inspiration.

Related

Video datasets in Python

I am a new to deep learning algorithms and Machine learning as well as working with data. I am currently trying to work with annotated video dataset, I tried to have a simple example on How I should get started. I am aware that to work with video dataset, we will first need to extract the images from videos and then do the image processing. However, as I am new it is still difficult for me to understand the steps. I came accross this link, it is great but the data is really large and it cannot be downloaded on my computer.
https://www.analyticsvidhya.com/blog/2019/09/step-by-step-deep-learning-tutorial-video-classification-python/
Any suggestions to a walk through examples I can use to build my understanding and Know how to deal with these datasets

Here is a way to create synthetic video dataset quickly:
import numpy as np
import skvideo.io as sk
# creating sample video data (Here object is moving towards left)
num_vids = 5
num_imgs = 50
img_size = 50
min_object_size = 1
max_object_size = 5
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
vid_name = "vid" + str(i_vid) + ".mp4"
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size - w)
y = np.random.randint(0, img_size - h)
i_img = 0
while x > 0:
imgs[i_img, y : y + h, x : x + w] = 255 # set rectangle as foreground
x = x - 1
i_img = i_img + 1
sk.vwrite(vid_name, imgs.astype(np.uint8))
from IPython.display import Video
Video("vid3.mp4") # the script & video generated should be in same folder
Similarly you can create videos where, object(s) move(s) in other directions.

How to extract foreground objects from COCO dataset or Open Images V6 Dataset?

Currently, I am preparing a synthetic dataset for object detection task. There are annotated datasets available for this kind of tasks like COCO dataset and Open Images V6. I am trying to download the images from there but only the foreground objects for a specific class e.g. person, in other words images without transparent background. The reason I am doing this is that I want to insert those images after editing them into a new images e.g. a street scene.
What I have tried so far, I used a library called FiftyOne and I downloaded the dataset with their semantic label and I am stuck here and I don`t what else to do.
It is not necessary to use FiftyOne any other method would work.
Here is the code that I have used to download a sample of the dataset with their labels
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"coco-2017",
split="validation",
dataset_dir = "path/fiftyone",
label_types=["segmentations"],
classes = ["person"],
max_samples=10,
label_field="instances",
dataset_name="coco-images-person",
)
# Export the dataset
dataset.export(
export_dir = "path/fiftyone/image-segmentation-dataset",
dataset_type=fo.types.ImageSegmentationDirectory,
label_field="instances",
)
Thank you

The easiest way to do this is by using FiftyOne to iterate over your dataset in a simple Python loop, using OpenCV and Numpy to format and write the images of object instances to disk.
For example, this function will take in any collection of FiftyOne samples (either a Dataset for View) and write all object instances to disk in folders separated by class label:
import os
import cv2
import numpy as np
def extract_classwise_instances(samples, output_dir, label_field, ext=".png"):
print("Extracting object instances...")
for sample in samples.iter_samples(progress=True):
img = cv2.imread(sample.filepath)
img_h,img_w,c = img.shape
for det in sample[label_field].detections:
mask = det.mask
[x,y,w,h] = det.bounding_box
x = int(x * img_w)
y = int(y * img_h)
h, w = mask.shape
mask_img = img[y:y+h, x:x+w, :]
alpha = mask.astype(np.uint8)*255
alpha = np.expand_dims(alpha, 2)
mask_img = np.concatenate((mask_img, alpha), axis=2)
label = det.label
label_dir = os.path.join(output_dir, label)
if not os.path.exists(label_dir):
os.mkdir(label_dir)
output_filepath = os.path.join(label_dir, det.id+ext)
cv2.imwrite(output_filepath, mask_img)
Here is a complete example that loads a subset of the COCO2017 dataset and writes all "person" instances to disk:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset_name = "coco-image-example"
if dataset_name in fo.list_datasets():
fo.delete_dataset(dataset_name)
label_field = "ground_truth"
classes = ["person"]
dataset = foz.load_zoo_dataset(
"coco-2017",
split="validation",
label_types=["segmentations"],
classes=classes,
max_samples=20,
label_field=label_field,
dataset_name=dataset_name,
)
view = dataset.filter_labels(label_field, F("label").is_in(classes))
output_dir = "/path/to/output/segmentations/dir/"
os.makedirs(output_dir, exist_ok=True)
extract_classwise_instances(view, output_dir, label_field)
If this capability is something that will be used regularly, it may be useful to write a custom dataset exporter for this format.

How can I resize a mask and RGB image to match by cropping out unwanted regions in both images

I am working on a cell counting project with a histology dataset of RGB images and their corresponding masks. However, I have been stuck for over a week now on resizing the RGB and masks images to only the FOV by cropping out the regions of zero pixels which can be clearly seen on the masks without affecting the annotations withing. Please, any suggestions will be beneficial. A screenshot of the images I obtained is shown below:
** My Code **
# Data Path
IMAGE_PATH = '/content/drive/MyDrive/dissertation/QCed single-rater dataset/rgb/'
MASKS_PATH = '/content/drive/MyDrive/dissertation/QCed single-rater dataset/mask/'
TRUE_LABEL_PATH = '/content/drive/MyDrive/dissertation/QCed single-rater dataset/visualization/'
dataset_path = '/content/drive/MyDrive/dissertation/NuCLS_dataset/'
# Get train and test IDs
image_ids = sorted(os.listdir(IMAGE_PATH)) #next(os.walk(IMAGE_PATH))[2]
mask_ids = sorted(os.listdir(MASKS_PATH)) #next(os.walk(MASKS_PATH))[2]
true_ids = sorted(os.listdir(TRUE_LABEL_PATH)) #next(os.walk(MASKS_PATH))[2]
#training data
train_data = train_imgs[:int(train_imgs.shape[0]*0.85)] #training data = 85% train_imgs
train_mask = np.squeeze(train_masks[:int(train_masks.shape[0]*0.85)]) # train mask
train_label = true_labels[:int(true_labels.shape[0]*0.85)] #training data = 85% train_imgs
# validation data
val_data = train_imgs[int(train_imgs.shape[0]*0.85):int(train_imgs.shape[0]*0.95)] # validation data = 10%train_imgs
val_mask = np.squeeze(train_masks[int(train_masks.shape[0]*0.85):int(train_imgs.shape[0]*0.95)]) # val mask
val_label = true_labels[int(true_labels.shape[0]*0.85):int(true_labels.shape[0]*0.95)]
#test data
test_data = train_imgs[int(train_imgs.shape[0]*0.95):] # test data = 5%train_imgs
test_mask = np.squeeze(train_masks[int(train_masks.shape[0]*0.95):]) # val mask
test_label = true_labels[int(true_labels.shape[0]*0.95):]
print(val_mask.shape)
(174, 256, 256, 3)
ix = 0
for ix in range(0,5):
print('Training example No.',ix)
fig = plt.figure(figsize=(16, 16))
plt.subplot(131).set_title('Original Image')
plt.imshow(test_data[ix])
plt.subplot(132).set_title('Mask (Target)')
plt.imshow(test_mask[ix])
plt.subplot(133).set_title('True Label')
plt.imshow(test_label[ix])
#plt.savefig(base_path +'fig- Sanity check on training dataset no {}.png'.format(ix))
plt.show()
ix +=1
Additional Information
I also have a CSV file containing the dimensions of the purple region, which I want both RGB and mask to be resized to. I am just stuck with implementing this on the RGB and mask images.

** My Answer **
Here is how I resolved this issue for anyone who might face similar challenges.
Firstly, I ensured my file names match that from the CSV by simply adding the suffix '.png' to the fovname column of the CSV.
df['fovname'] = df['fovname'].astype(str)+ '.png'
print (list(df['fovname']))
Then, cropping the images with the appropriate FOV coordinates solve the issue.
# RGB Images
for x in image_ids:
im = Image.open(IMAGE_PATH + x)
DF = df.loc[df['fovname']== x]
DF = DF.drop_duplicates()
xmin = DF['xmin']
ymin = DF['ymin']
xmax = DF['xmax']
ymax = DF['ymax']
print(x)
im = im.crop((xmin, ymin, xmax, ymax))
data_path = '/content/drive/MyDrive/dissertation/NuCLS_dataset/NEW/RGB/'
im.save(data_path+'{}'.format(x))
print('Saved')
#plt.imshow(im)

I think if you crop to the parts of the images you are interested in, then imshow() will zoom to show them in as much space as is available.
Cropping is discussed in a previous question Cropping image by the center.

Python OpenCV - Eigenfaces face recognition

I'm trying to create simple Eigenfaces face recognition app using Python and OpenCV. Unfortunately when I try to play app, then I got result:
(-1, '\n', 1.7976931348623157e+308), where -1 stands for not found and confidence... Is quite high...
Is there possibility to put by someone the most basic OpenCV implementation of Eigenfaces?
Here is my approach to the problem. I use Python2, as it is suggested in official documentation (due to some problems with P3).
import cv2 as cv
import numpy as np
import os
num_components = 10
threshold = 10.0
faceRecognizer = cv.face_EigenFaceRecognizer.create(num_components, threshold)
images = []
labels = []
textLabels = ["Person1", "Person2", "Person3"]
destinedIm = cv.imread("images/set1/1.jpg", cv.IMREAD_GRAYSCALE)
destinedSize = destinedIm.shape
#Person1
img = cv.imread("images/set1/1.jpg", cv.IMREAD_GRAYSCALE)
imResized = cv.resize(img, destinedSize)
images.append(imResized)
labels.append(0)
#In similar way I read total 8 images of set1 and 6 images of set2 (2 different people, with label 0 and 1 respectively)
cv.imwrite("images/set2/resized.jpg", imResized) #this doesn't work
numpyImages = np.array(images)
numpyLabels = np.array(labels)
# cv.face_FaceRecognizer.train(self=faceRecognizer, src=images, labels=labels)
faceRecognizer.train(src=images, labels=numpyLabels)
testImage = cv.imread("images/set1/testIm.jpg", cv.IMREAD_GRAYSCALE)
# cv.face_FaceRecognizer.predict()
resultLabel, resultConfidence = faceRecognizer.predict(testImage)
print (resultLabel, "\n" ,resultConfidence)
testImage is another image of person with label = 0;

I would look at the sizing of the testImage. Also, I used a different sizing method than you used and got it working.
face_resized = cv2.resize(img, (299, 299))

Wrong values while appending 2 list obtain from OpenCV image reading

I'm reading images from 2 folders and then appending it. I'm using the following code to do it.
import cv2
import os
path = "C:/Users/username/Desktop/Data/Val1"
def load_images_from_folder(folder):
images = []
labels = []
for filename in os.listdir(folder):
img = cv2.imread(os.path.join(folder,filename))
label = path.split(os.path.sep)[-1].split("/")[-1]
if img is not None:
images.append(img)
labels.append(label)
return images, labels
train_data_all_pos = load_images_from_folder("C:/Users/username/Desktop/Data/Val1")
train_features_pos = train_data_all_pos[0]
train_labels_pos = train_data_all_pos[1]
print(len(train_features_pos))
print(len(train_labels_pos))
path = "C:/Users/username/Desktop/Data/Val2"
train_data_all_neg = load_images_from_folder("C:/Users/username/Desktop/Data/Val2")
train_features_neg = train_data_all_neg[0]
train_labels_neg = train_data_all_neg[1]
print(len(train_features_neg))
print(len(train_labels_neg))
train_features = np.append(train_features_pos, train_features_neg)
train_labels = np.append(train_labels_pos, train_labels_neg)
print(len(train_features))
print(len(train_labels))
Total images in Val1 is 163 and Val2 is 340. But when I'm appending, for Labels I'm getting 503, but for features I'm getting value in millions. Each image dimension is 64*64*3. So, the total value I'm getting is 6180864, which is just 64*64*3*503.
The problem I'm having is the dimension of features and images are not the same. I'm making a mistake here, but not sure how to rectify it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tesseract OCR extraction - python

Related

Video datasets in Python

How to extract foreground objects from COCO dataset or Open Images V6 Dataset?

How can I resize a mask and RGB image to match by cropping out unwanted regions in both images

Python OpenCV - Eigenfaces face recognition

Wrong values while appending 2 list obtain from OpenCV image reading

Categories

Resources