Recognize what is on image - python

i'm doing a little project with neural networks. I've read about digit recognition, with MNIST dataset and thought if it possible to make same dataset but with regular objects we see every day.
So here's algorithm( if we can say so):
All is done with opencv library for python
1) Get contours from image. This is not literally contours, but something that looks so.
I've done this with this code:
def findContour(self):
gray = cv2.cvtColor(self.image, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
self.image = cv2.Canny(gray, 30, 200)
2) Next need to create training set.
I copy and edit this message. Change rotation and flip it -- now we have about 40 images, which are consists of rotated contours.
3) Now i'm gonna dump this images to a csv file.
These images are represented as 3D array, so i flatten them using .flatten function from numpy. Next this flatten vector is written in csv file, with label as last character
This is what i've done, and want to ask : will it work out?
Next i want to use everything except last element as input x vector, and last elements as y vector. (like here)
Recognizing will be done same way : we getting contour of image, and feed it to neural network, output will be label.
Is it even possible, or better not to try?

There is plenty of room for experimentation. However, you should not reinvent the wheel, except as a learning exercise. Research the paradigm, learn what already exists, and then go make your own wheel improvements.
I strongly recommend that you start with image recognition in CNNs (convolutional neural networks). A lot of wonderful work has been done with the ILSVRC 2012 image data set (a.k.a. ImageNet files). In fact, a large part of today's NN popularity comes from Alex Krizhevsky's breakthrough (resulting in AlexNet, the first NN to win the ILSVRC) and ensuing topologies (ResNet, GoogleNet, VGG, etc.).
The simple answer is to let your network "decide" what's important in the original photo. Certainly, flatten the image and feed it contours, but don't be surprised if a training run on the original images produces superior results.
Search for resources on "Image Recognition introduction" and pick a few of the hits that match your current reading and topic interests. There are plenty of good ones out there.
When you get to programming your own models, I strongly recommend that you use an existing framework, rather than building all that collateral from scratch. Dump the CSV format; there are better ones with pre-packaged I/O routines and plenty of support. The idea is to let you design your network, rather than manipulating data all the time.
Popular frameworks include Caffe, TensorFlow, Torch, Theano, and CNTK, among others. So far, I've found Caffe and Torch to have the easiest overall learning curves, although there's not so much difference that I'd actually recommend one over another in general. Look for one that has good documentation and examples in your areas of interest.

Related

Get character out of image with python

I want to detect the characters in a image like this with python:
In this case the code should return the result '6010001'.
How can I get the result out of this image? What do I need?
For your information, if the solution is a AI-solution, there are about 20.000 labeled images.
Thank you in forward :)
Question: Are all the pictures of similar nature?
Meaning the Numbers are stamped into a similar material, or are they random pictures with numbers with different techniques (e.g. pen drawn, stamped etc.)?
If they are all quite similar (nice contrast as in sample pic), I would recommend to write your "own" AI, otherwise use an existing neural network / library (as I assume you may want to avoid the pain of creating your own neural network - and tag a lot of pictures).
If they pics are quite "similar", following suggested approach:
greyscale Image with increase contrast
define box (greater than a digit), scan over image and count 0s, define by trial valid range to detect a digit, avoid overlaps
each hit take area, split it in sectors, e.g. 6x4, count 0s
build a little knowledge base (csv file) of counts per sector for each number from 0-9 (e.g. a string); you will end up in the database with multiple valid strings per each number, just ensure they are unique (otherwise redefine steps 1-3)
In addition I recommend to make yourself a smart knowledge database, meaning: if the digit could not be identified, save digit picture and result. Then make yourself a little review program where it shows you the undefined digits and the result string, you can then manually add them to your knowledge database for the respective number.
Hope it helps. I used the same approach read a lot of different data from screen pictures and store them in a database. Works like a charm.
#better do it yourself than using a standard neural network :)
You can use opencv-python and pytesseract
import cv2
import pytesseract
img = cv2.imread('img3.jpeg')
text = pytesseract.image_to_string(img)
print(text)
It doesn't work for all images with text, but works for most.

Object recognition with CNN, what is the best way to train my model : photos or videos?

I aim to design an app that recognize a certain type of objects (let's say, a book) and that can say whether the input is effectively a book or not (binary classification).
For a better user experience, I would like the input to be a video rather than a picture: that way, the user won't have to deal with issues such as sharpness, centering of the object... He'll just have to make a "scan" of the object, without much consideration for the quality of a single image.
And there comes my problem : As I intend to create my training dataset from scratch (the true object I want to detect being absent from existing datasets such as ImageNet),
I was wondering if videos were irrelevant for this type of binary classification and if I should rather ask the user to take a good picture of the object.
On one hand, videos have the advantage of constituting a larger dataset than one created only from photos (though I can expand my picture's dataset thanks to data augmentation) as it is easier to take a 10s video of an object rather than taking 10x24 (more or less…) pictures of it.
But on the other hand I fear the result will be less precise, as in a video many frames are redundant and the average quality might not be as good as in a single, proper image.
Moreover, I do not intend to use the time property of a video (as in a scan the temporality is useless) but rather working one frame at a time (as depicted in this article).
What is the proper way of constituting my dataset? As I really would like to keep this “scan” for the user’s comfort and if images are more precise than videos in such a classification is it eventually possible to automatically extract a single image from a “scan”, and working directly on it?
Good question! The answer is: you should train your model on how you plan to use it. So if you ask the user to take photos, train it on photos. If you ask the user to film the object, train on frames extracted from video.
The images might seem blurry to you, but they won't be for a computer. It will just learn to detect "blurry books", but that's OK, that's what you want.
Of course this is not always the case. The image might become so blurry that the information whether or not there is a book in the frame is no longer there. Where is the line? A general rule of thumb: if you can see it's a book, the computer will also see it. As I think blurry images of books will still be recognizable as books, I think you could totally do it.
Creating "photos (single image, sharp)" from "scan (more blurry, frames from video)" can be done, it's called super-resolution. But those models are pretty beefy, not something you would want to run on a mobile device.
On a completely unrelated note: try googling Transfer Learning! It will benefit you for sure :D.

Analyse audio files with Python

I actually have Photodiode connect to my PC an do capturing with Audacity.
I want to improve this by using an old RPI1 as dedicated test station. As result the shutter speed should appear on the console. I would prefere a python solution for getting signal an analyse it.
Can anyone give me some suggestions? I played around with oct2py, but i dont really under stand how to calculate the time between the two peak of the signal.
I have no expertise on sound analysis with Python and this is what I found doing some internet research as far as I am interested by this topic
pyAudioAnalysis for an eponym purpose
You an use pyAudioAnalysis developed by Theodoros Giannakopoulos
Towards your end, function mtFileClassification() from audioSegmentation.py can be a good start. This function
splits an audio signal to successive mid-term segments and extracts mid-term feature statistics from each of these sgments, using mtFeatureExtraction() from audioFeatureExtraction.py
classifies each segment using a pre-trained supervised model
merges successive fix-sized segments that share the same class label to larger segments
visualize statistics regarding the results of the segmentation - classification process.
For instance
from pyAudioAnalysis import audioSegmentation as aS
[flagsInd, classesAll, acc, CM] = aS.mtFileClassification("data/scottish.wav","data/svmSM", "svm", True, 'data/scottish.segments')
Note that the last argument of this function is a .segment file. This is used as ground-truth (if available) in order to estimate the overall performance of the classification-segmentation method. If this file does not exist, the performance measure is not calculated. These files are simple comma-separated files of the format: ,,. For example:
0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech
23.50,184.30,music
184.30,185.10,silence
185.10,200.75,speech
...
If I have well understood your question this is at least what you want to generate isn't it ? I rather think you have to provide it there.
Most of these information are directly quoted from his wiki which I suggest you to read it. Yet don't hesitate to reach out as far as I am really interested by this topic
Other available libraries for audio analysis :

Training using the custom dataset instead of MNIST

I would like to use a custom dataset that contains image of handwritten characters of a different language other than English. I am planning to use the KNN algorithm of classify the handwritten characters.
Here are some of the challenges i am facing at this point of time.
1. The images are of different sizes. - How do we solve this issue, any ETL work to be done using Python?
2. Even if we assume they are of same size, the potential pixels of every image would be around 70 * 70 as the letters are complex than English with many features between characters. - How does this affect my training and the performance?
Choose a certain size and resize all images (for example with PIL module);
I suppose that it depends on the quality of the data and on the language itself. If letters are complex (like hieroglyphs) it will be difficult. Otherwize if the letters are drawn with thin lines, they could be recognized even in little pictures.
Anyway, if the drawn letters are too similar to each other, it would be more difficult to recognize them, of course.
One interesting idea is not simply using pixels as training data, you could create some special features, as described here: http://archive.ics.uci.edu/ml/datasets/Letter+Recognition

Image segmentation with python

This for a homework question for implementing clustering algorithms. The code has already been given to me but its implemented in matlab and since I am using python I don't know what to make of it. I think I'll have to write it from scratch
I've been given a text file which contains feature vectors for an image.
data = np.loadtxt("filename").T
# data.shape = n,4
where the first two features are the chrominence and last 2 are the co-ordinates of a pixel
I have another file which contains some information about the image :-
offset: 3
sx: 321
sy: 481
stepsize: 7
winsize: 7
Could anyone tell me how to form an image from set of feature vectors?
Also could anyone point to me some on-line resource for learning image segmentation with python? Thanks.
OpenImageIO is a very good place to start. It's used by many professional imaging applications like The Foundry's Nuke and others.
As of 1.37, they've got an all new Python API which can create images in all kinds of amatuer and professional formats (like DPX, EXR, etc) and all kinds of colorspaces (YCbCr, xvYCC, RGB, etc).
It's worth a gander looking at.

Categories

Resources