Recognize images in Python - python

I'm kinda new both to OCR recognition and Python.
What I'm trying to achieve is to run Tesseract from a Python script to 'recognize' some particular figures in a .tif.
I thought I could do some training for Tesseract but I didn't find any similar topic on Google and here at SO.
Basically I have some .tif that contains several images (like an 'arrow', a 'flower' and other icons), and I want the script to print as output the name of that icon. If it finds an arrow then print 'arrow'.
Is it feasible?

This is by no means a complete answer, but if there are multiple images in the tif and if you know the size in advance, you can standardize the image samples prior to classifying them. You would cut up the image into all the possible rectangles in the tif.
So when you create a classifier (I don't mention the methods here), the end result would take a synthesis of classifying all of the smaller rectangles.
So if given a tif , the 'arrow' or 'flower' images are 16px by 16px , say, you can use
Python PIL to create the samples.
from PIL import Image
image_samples = []
im = Image.open("input.tif")
sample_dimensions = (16,16)
for box in get_all_corner_combinations(im, sample_dimensions):
image_samples.append(im.crop(box))
classifier = YourClassifier()
classifications = []
for sample in image_samples:
classifications.append (classifier (sample))
label = fuse_classifications (classifications)
Again, I didn't talk about the learning step of actually writing YourClassifier. But hopefully this helps with laying out part of the problem.
There is a lot of research on the subject of learning to classify images as well as work in cleaning up noise in images before classifying them.
Consider browsing through this nice collection of existing Python machine learning libraries.
http://scipy-lectures.github.com/advanced/scikit-learn/index.html
There are many techniques that relate to images as well.

Related

OpenCV: Stitch with saving warping/arrangement informations and re-stitch with the saved info

First of all, I am a beginner in computer vision field, learning OpenCV from the web.
What I am trying is stitching multispectral (bands > 3) images with OpenCV stitching APIs.
I already know that OpenCV doesn't support multispectral image.
So, the idea I came up with is as follows:
Extract the RGB images from each multispectral image.
Use cv2.Stitcher_create() and stitcher.stitch class to stitch all the RGB images (reference: https://pyimagesearch.com/2018/12/17/image-stitching-with-opencv-and-python/). And save the warping and arrangement informations (ex. Homography, matching keypoints...) in making RGB panorama.
Stitch each remaining bands' image by loading the informations that saved in step 2.
The problem is, I can't find the codes for the saving and loading informations that required in step 2 and 3.
Is the suggested method possible? And if possible, is there any tips or references that I can use?
Yes you can do it (I did it before for my paper on stitching construction plans). You need to save the camera parameters after the feature matching and probably also the seam masks.
Look here (cameras) and here (seam masks)

What could be the solution for automatically Document Image Unwarping caused by 3d warping?

I want to make some kind of python script that can do that.
In my case i just want very simple unwarping as follow
Always having similar backgroud
Always Placing page at similar position
Always having same type of wraped image
I tried following methods but didn't work out.
I tried so many scanning apps but no app can unwarp 3d wrap
for example this one microsoft office lens
.
I tried page_dewarp.py. But it does not work with pages having spaces between texts or having segments of texts and most of times for that kind of images it just revert cure from left to right or vice versa and also unable to detect actual text area for example
I found deep-learning-for-document-dewarping that is trying to solve this problem by using pix2pixHD But i am not sure this is gona work and this project don't have trained models and currectly not solving the problem. should i train a model with just following training data train_A - warped input images and train_B - unwarped output images as mentioned at pix2pixHD. I can generate training data by make warped and uwarped images using blender 3d. In this way i can generate so many images by using some scanned book's pages by just rendering uwarped image and warped image it like someone taking photos of pages but virtually.

Is there a way to resize non-image files for CNN similarly to image-based examples?

I am working on a research project involving brain wave data. The goal is to classify (1,0) each "image." The problem is essentially an image classification problem, where I could use a CNN, but it's not clean at all like most CNN examples online. The files that I have are tsv's (each file is an individual trial from a patient), and I have stacked them all into one pickle file with each having the participant ID and trial ID as an additional column.
I want to feed them through a CNN, but almost examples online deal with equal-sized images. My data aren't of equal size, and they aren't images. I'm wanting to use PIL to make each file the same size, but is PIL even the correct way of doing so since I don't have image files?

Python: Recognize if image contains graphic/text or a picture

I want to write a script, that converts unknown images (jpg, png, gif, bmp, tiff, etc.) to a specific resolution and format as well as generating a thumbnail.
the problem is that the compression level, that is totally fine for pictures produces crap for exports of Presentations for example; So I want to differ the conversion settings based on the contents of the image.
Does anyone have experience in doing that kind of stuff in python (or shell scripts whose output is easily pasreable)?
my ideas are:
increase contrast and check histogramm if there are only single spikes left
doing a high pass filtering of the image and check what?
doing face recognition of known letters
the goal is that the recognition should be quite fast (approx. 10 images/second) and quite easy to implement
This is a pretty trivial machine learning problem, I would research the MNIST dataset problems that teach you how to recognize handwritten characters, this process should be very similar. Check out this tutorial and see if you can modify it to recognize graphics vs pictures. If your error rate ends up too high you'll have to try more advanced machine learning techniques.
http://mxnet.io/tutorials/python/mnist.html

How do I prepare data (images) for py-faster-rcnn classification training?

I am trying to train my own image classificator with py-faster-rcnn link using my own images.
It looks rather simple in the example here, but they are using some ready dataset (INRIA Person). Datasets are structured and cropped to sub-images (actually data sets there are both original images and cropped people images from them) and text notation of each image with coordinates of crops. Pretty straightforward.
Still I have no idea how this is done - do they use any sort of tool for this (I can hardly imagine some test lots of data are cropped and notated manually)?
Could anyone please suggest a solution for this one? Thanks.

Categories

Resources