I have two LMDB files, with the first one my network trains fine while with the other one doesn't really work (loss starts and stays at 0). So I figured that maybe there's something wrong with the second LMDB. I tried writing some python code (mostly taken from here) to fetch the data from my LMDBs and inspect it but so far no luck with any of the 2 databases. The LMDBs contain images as data and bounding box information as labels.
Doing this:
for key, value in lmdb_cursor:
datum.ParseFromString(value)
label = datum.label
data = caffe.io.datum_to_array(datum)
on either one of the LMDBs gives me a key which is correctly the name of the image, but that datum.ParseFromString function is not able to retrieve anything from value. label is always 0, while data is an empty ndarray. Nonetheless, the data is there, value is a binary string of around 140 KB which correctly accounts for the size of the image plus the bounding box information I guess.
I tried browsing several answers and discussions dealing with reading data from LMDBs in python, but I couldn't find any clue on how to read structured information such as bounding box labels. My guess is that the parsing function expects a digit label and interprets the first bytes as such, with the remaining data being then lost due to the binary string not making any sense anymore?
I know for a fact that at least the first LMDB is correct since my network performs correctly in both training and testing using it.
Any inputs will be greatly appreciated!
The basic element stored in your LMDB is not Datum, but rather AnnotatedDatum. Threfore, you need to approach it with a little care:
datum.ParseFromString(value.datum)
value.annotation_group # should store the annotations
Related
I want to detect the characters in a image like this with python:
In this case the code should return the result '6010001'.
How can I get the result out of this image? What do I need?
For your information, if the solution is a AI-solution, there are about 20.000 labeled images.
Thank you in forward :)
Question: Are all the pictures of similar nature?
Meaning the Numbers are stamped into a similar material, or are they random pictures with numbers with different techniques (e.g. pen drawn, stamped etc.)?
If they are all quite similar (nice contrast as in sample pic), I would recommend to write your "own" AI, otherwise use an existing neural network / library (as I assume you may want to avoid the pain of creating your own neural network - and tag a lot of pictures).
If they pics are quite "similar", following suggested approach:
greyscale Image with increase contrast
define box (greater than a digit), scan over image and count 0s, define by trial valid range to detect a digit, avoid overlaps
each hit take area, split it in sectors, e.g. 6x4, count 0s
build a little knowledge base (csv file) of counts per sector for each number from 0-9 (e.g. a string); you will end up in the database with multiple valid strings per each number, just ensure they are unique (otherwise redefine steps 1-3)
In addition I recommend to make yourself a smart knowledge database, meaning: if the digit could not be identified, save digit picture and result. Then make yourself a little review program where it shows you the undefined digits and the result string, you can then manually add them to your knowledge database for the respective number.
Hope it helps. I used the same approach read a lot of different data from screen pictures and store them in a database. Works like a charm.
#better do it yourself than using a standard neural network :)
You can use opencv-python and pytesseract
import cv2
import pytesseract
img = cv2.imread('img3.jpeg')
text = pytesseract.image_to_string(img)
print(text)
It doesn't work for all images with text, but works for most.
I have a customer that insists on sending box labels in a PDF document. Printing the labels all at once can cause confusion which results in wrong labels being put on boxes. Is it possible to scan a barcode from a box and print the corresponding label (with identical barcode), one print per scan, until all the labels have been printed from the document? There is often more than one label with the same barcode, but I want each to print only once, and not allow duplicates. I am beginning to learn Python, but have no clue where to begin to write a script to do this operation.
If you are really a beginner at python: There is a great book about automation, the web version is free:
https://automatetheboringstuff.com
I aim to design an app that recognize a certain type of objects (let's say, a book) and that can say whether the input is effectively a book or not (binary classification).
For a better user experience, I would like the input to be a video rather than a picture: that way, the user won't have to deal with issues such as sharpness, centering of the object... He'll just have to make a "scan" of the object, without much consideration for the quality of a single image.
And there comes my problem : As I intend to create my training dataset from scratch (the true object I want to detect being absent from existing datasets such as ImageNet),
I was wondering if videos were irrelevant for this type of binary classification and if I should rather ask the user to take a good picture of the object.
On one hand, videos have the advantage of constituting a larger dataset than one created only from photos (though I can expand my picture's dataset thanks to data augmentation) as it is easier to take a 10s video of an object rather than taking 10x24 (more or less…) pictures of it.
But on the other hand I fear the result will be less precise, as in a video many frames are redundant and the average quality might not be as good as in a single, proper image.
Moreover, I do not intend to use the time property of a video (as in a scan the temporality is useless) but rather working one frame at a time (as depicted in this article).
What is the proper way of constituting my dataset? As I really would like to keep this “scan” for the user’s comfort and if images are more precise than videos in such a classification is it eventually possible to automatically extract a single image from a “scan”, and working directly on it?
Good question! The answer is: you should train your model on how you plan to use it. So if you ask the user to take photos, train it on photos. If you ask the user to film the object, train on frames extracted from video.
The images might seem blurry to you, but they won't be for a computer. It will just learn to detect "blurry books", but that's OK, that's what you want.
Of course this is not always the case. The image might become so blurry that the information whether or not there is a book in the frame is no longer there. Where is the line? A general rule of thumb: if you can see it's a book, the computer will also see it. As I think blurry images of books will still be recognizable as books, I think you could totally do it.
Creating "photos (single image, sharp)" from "scan (more blurry, frames from video)" can be done, it's called super-resolution. But those models are pretty beefy, not something you would want to run on a mobile device.
On a completely unrelated note: try googling Transfer Learning! It will benefit you for sure :D.
I have a huge sequence (1000000) of small matrices (32x32) stored in a hdf5 file, each one with a label.
Each of this matrices represent a sensor data for a specific time.
I want to obtain the evolution for each pixel in for a small time slice, different for each x,y position in the matrix.
This is taking more time than I expect.
def getPixelSlice (self,xpixel,ypixel,initphoto,endphoto):
#obtain h5 keys inside time range between initphoto and endphoto
valid=np.where(np.logical_and(self.photoList>=initphoto,self.photoList<endphoto))
#look at pixel data in valid frames
evolution = []
#for each valid frame, obtain the data, and append the target pixel to the list.
for frame in valid[0]:
data = self.h5f[str(self.photoList[frame])]
evolution.append(data[ypixel][xpixel])
return evolution,valid
So, there is a problem here that took me a while to sort out for a similar application. Due to the physical limitations of hard drives, the data are stored in such a way that with a three dimensional array it will always be easier to read in one orientation than another. It all depends on what order you stored the data in.
How you handle this problem depends on your application. My specific application can be characterized as "write few, read many". In this case, it makes the most sense to store the data in the order that I expect to read it. To do this, I use PyTables and specify a "chunkshape" that is the same as one of my timeseries. So, in your case it would be (1,1,1000000). I'm not sure if that size is too large or not, though, so you may need to break it down a bit farther, say (1,1,10000) or something like that.
For more info see PyTables Optimization Tips.
For applications where you intend to read in a specific orientation many times, it is crucial that you choose an appropriate chuck shape for your HDF5 arrays.
In a Django app, users upload various photos and get upvoted/downvoted (kind of like 9gag).
I want to put in place a basic check that prevents the user from re-submitting images already recently submitted on the website.
I don't need an airtight solution. How my question differs from other such questions on SO is that this isn't just a case of comparing two images, this is a case of comparing an uploaded image to , say, the 200 most recently uploaded images (my arbitrary cut-off). Performance takes the front seat.
Since I'm thumbnailing all images already (40px x 40px), I'm going to compare photo thumbnails instead of full-blown photos. This will be equivalent to comparing down-sampled objects, thus it'll be faster and more fuzzy (which is good).
My question is: is there a decent way to reduce image histograms to a unique number (of base 10 or 16, for instance)? If there is, I can store them in the DB, find the distance between such values, and impose an arbitrary cut-off. An illustrative example would be nice. This, in my head, sounds like the fastest way to handle my case.
Alternatively, if it can't be done due to various reasons, that's a legit answer too.
You probably want to use some sort of perceptual image hashing. I haven't tried it, but looks like https://pypi.python.org/pypi/ImageHash might do the trick.