How to create an OCR dataset? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm just a beginner in Machine learning. I've just learnt supervised machine learning so far with some basic image classification and regression problem. I've just done an image classification problem with sklearn load_digits() which has about 1800 images of the characters from 0-9 (description of the dataset) . What I want to do is to make my own dataset instead of loading it from sklearn like:
from sklearn.datasets import load_digits
I want to use my own dataset. So can someone guide me can I make my own dataset in CSV or any other format so that I can use it in my supervised machine learning technique ?

First thing would be to understand your use case. There is difference between OCR and Image Classification tasks. Lets look at both of the scenarios.
Image Classification : The task is similar to standard supervised tasks that you might have seen in ML only in this case we classify image instead of data in a sheet. Data Curation is one of the major tasks involved in image classification and complete accuracy depends upon how you processed your data. lets say given an image you want to identify if its a dog or a cat. This would require you to collect at least 500 images each of different types of dogs and cat. You can also artificially create the image by taking an image of a dog and then use python OpenCV library to add some noise or rotation and save the updated image. This way you can collect more images in short span of time. Once you have the images for all the categories you want to classify ( dogs and cats ), you can then go for model selection. CNN (Convolutional Neural Network) are considered to be best for image classification tasks but creating them from scratch and tuning them could take long time. My advise would be to use Tensorflow Object Detection API the provides a good framework for beginners to built their own image classifier or object detector with many pre-trained models to choose from. https://github.com/tensorflow/models/tree/master/research/object_detection
OCR : OCR is one of the complex application of image classification and its not that easy to built from scratch. In the example you mentioned in your question, though it looks like an OCR but its more or less an image classification task, since you have a single image of each character that you are trying to classify. In real world OCR would involve handwritten notes and extracting the text written in them to your system which is a complicated process. There are some prebuilt libraries like Tesseract that specializes in OCR, by taking the input image with text written on it and it returns the text present in the image in string format. However, these libraries fails when it comes to handwritten text as those are much difficult to read. If you are interested in building an OCR system from scratch it would require you great deal of image processing tasks. Lets say you have an image on which there is a phone number written by someone. You OCR system would first have to detect each numbers separately by drawing detection boxes around each number in the image (you can use tensorflow object detection system api mentioned above) but lets say you have an image of both alphabets and numbers and symbols, this would then be complex tasks to first collect individual images of each alphabet , numbers and symbols which could be tough. My advise again would be to use API which are free and also much accurate. I used Microsoft Cognitive Vision API that has an OCR function to detect any type of text from an image. This would reduce your effort to only properly cleaning the image.

Related

How to perform specific object recognition on a image?

I have 3 images of differents objets : a smartphone, a shirt and a packet of pasta.
I want to perform recognition of each object on any images containing one of these objects.
For example, if we have the same phone in a picture, i want to be able to see the phone with a bounded box drawn in this picture. If the phone is different, nothing should be drawn.
I first tried to perform object recognition using neural network like Mask R-CNN with python and tensorflow. But i realized that i haven't a huge training dataset, only my 3 images. Neural network algorithms seem to be adapted to recognize concept like dog, smartphone, landscape but not a particular dog, a specific smartphone or a specific landscape.
To get to the point, if i have in input any picture that contain the same smartphone, the same shirt or the same packet of pasta, i want the program to detect that.
What algorithms are best suited to perform this recognition ?
Try using the COCO dataset. Since the COCO weights have already been trained on thousands of items and images, you should just be able to run the splash feature to help detection with Mask RCNN.
Worst case scenario, if you want to train your own dataset, just find a lot of photos online relating to the objects you want to detect, annotate them, then train.

Unable to improve the mask RCNN model for document images?

I am training a model to extract all the necessary fields from a resume for which I am using mask rcnn to detect the fields in image. I have trained my mask RCNN model for 1000 training samples with 49 fields to extract. I am unable to improve the accuracy. How to improve the model? Is there any pretrained weights that may help?
Difficulty in reading following text -
Looks like you want to do text classification/processing, you need to extract details from the text but you are applying object detection algorithms. I believe you need to use OCR to extract text (if you have cv as an image) and use the text classification model. Check out the below links more information about text classification -
https://medium.com/#armandj.olivares/a-basic-nlp-tutorial-for-news-multiclass-categorization-82afa6d46aa5
https://www.tensorflow.org/tutorials/tensorflow_text/intro
You can break up the problem two different ways:
Step 1- OCR seems to be the most direct way to get to your data. But increase the image size, thus resolution, otherwise, you may lose data.
Step 2- Store the coordinates of each OCRed word. This is valuable information in this context. How words line up have significance.
Step 3- At this point you can try to use basic positional clustering to group words. However, this can easily fail on a columnar vs row-based distribution of related text.
Step 4- See if you can identify which of 49 tags these clusters belong to.
Look at text classification for Hidden Markov models, Baum-Welch Algorithms. i.e. Go for basic models first.
OR
The above ignores the inherent classification opportunity that is the image of a, well, a properly formatted cv.
Step 1- Train your model to partition the image into sections without OCR. A good model should not break up the sentences, tables etc. This approach may leverage separators lines etc. There is also opportunity to decrease the size of your image since you are not OCRing yet.
Step 2 -OCR image sections and try to classify similar to above.
Another option is to use the neural networks like - PixelLink: Detecting Scene Text via Instance Segmentation
https://arxiv.org/pdf/1801.01315.pdf

Recognition of nipple exposure in the image, and Automatically cover nipple area

I'd like to implement something like the title, but I wonder if it's technically possible.
I know that it is possible to recognize pictures with CNN,
but I don't know if can be automatically covered nipple area.
If have library information about any related information,
I would like to get some advice.
CNNs are able to detect whatever you train them for, to varying degree of accuracy. What you would need are a lot of training samples (ie. samples of ground truths with the original image, and the labeled image) with which to train your models, and then some new data which you can test the accuracy of your model on. The point is, CNNs are not biased to innately learn a task, you have to tell them what to learn!
I can recommend the machine learning library Keras (https://keras.io/) if you plan to do some machine learning using CNNs, as it's pretty simple and somewhat beginner-friendly. Take some of the tutorials for CNNs, which are quite good.
Essentially, you have what I can only assume is a pretty niche problem. The main issue will come down to how much data you have to train your model. CNNs need a lot of training data, especially for a problem like this which isn't simple. A way which would make this simpler would be to have a model which detects the ahem area of interest and denotes it as such on a per-pixel basis. Then a simple mask could be applied to the source image to censor it. This relates to image segmentation, and there are many academic papers on the topic.

How to recognize real scenes image using scikit-learn?

I am new in scikit-learn, I have a lot of images and images size not all same, A kind of are real scenes image like
cdn.mayike.com/emotion/img/attached/1/image/dt/20170920/12/20170920121356_795.png
cdn.mayike.com/emotion/img/attached/1/image/mainImg/20170916/15/20170916153205_512.png
, another are not real scenes image like
cdn.mayike.com/emotion/img/attached/1/image/dt/20170917/01/20170917011403_856.jpeg
cdn.mayike.com/emotion/img/attached/1/image/dt/20170917/14/20170917145613_197.png
.
I want to use scikit-learn recognizing which not real scenes image, I think it simlar to http://scikit-learn.org/stable/auto_examples/applications/plot_face_recognition.html#sphx-glr-auto-examples-applications-plot-face-recognition-py. I am totally no idea how to begin.How to creating dateset and extracting features from images? Can someone tell me what should I do?
This seems to not directly be a programming problem here and your questions are related to non-basic 'current' research.
It seems that you should read about Natural Scene (Statistics) and get yourself familiar with one of the current machine learning frameworks like TensorFlow, Caffe.
There are many tutorials out there to get started, for example you could begin with a binary classifier which outputs if the given image shows a natural scene or not.
Your database setup could have a structure like so:
-> Dataset
-> natural_scenes
-> artificial_images
Digits for example can use such a structure to create a dataset and is able to use models designed for Caffe and TensorFlow.
I would also recommend that you read about finetuning nerual networks, as you would need a lot of images in your database if you start training from scratch.
In Caffe you can finetune pretrained models like CaffeNet or GoogeNet.
I think those are some basic information which should get you started.
As of scikit-learn and face-detection: Face-Detection is more looking for local candidates or image patches which could possibly contain a face. Your problem on the other hand is more of a global problem as the whole image is concerned. That said I would start off with a neural network here which is able to extract local and global features for you.

Steps involved in classifying images?

I am very new to machine learning and have been implementing ML algorithms on the datasets.
But how do I go about classifying images using the Ml algorithms?
How do I feed the images to the learning models in the form of numpy arrays?
Can anyone brief me about the steps involved? I have been reading about feature extraction but I am not able to figure out how to do that.
Image classification is not much different, at its core, from any other sort of classification.
Your data are images, right? Well, we need to create some variables ("features") from those images in order to get a sense of what's in the images. Computers can understand matrices, not just straight-up images like humans do (although there are arguments that what humans are doing when they see images is deconstructing images into patterns of pixels, but let's keep it simple). Using OpenCV is a great way to turn image pixels into matrices.
Each matrix (i.e. each image) will have a corresponding tag or classification (e.g. "dog" or "cat"). You feed those matrices through your algorithm in order to classify each image.
That will get you started. There's so much that goes into machine learning related to images, but at its core, the problem is the same as elsewhere: take a matrix/set of data and use an algorithm to find patterns in the data and a function that maps the input to the output label. You might be served well by reading an intro to machine learning book or taking a course.

Categories

Resources