implement image segmentation with python generator

implement image segmentation with python generator - python

Following the last question: read big image file as an array in python
Due to the memory limitation of my laptop, I would like to implement image segmentation algorithm with python generator which can read every pixel at a time, rather than the whole image.
My laptop is Window 7 (64 bit OS) with 4G ram and Intel(R) Core (TM) i7-2860 QM CPU, and the images I am processing are over 2G. The algorithm I want to apply is watershed segmentation: http://scikits-image.org/docs/dev/auto_examples/plot_watershed.html
The only similar example I can find is http://vkedco.blogspot.com/2012/04/rgb-to-gray-level-to-binary-python.html, but what I need is not just converting a pixel value at a time. I need to consider the relations among near pixels. How can I do?
Any idea or hint for me? Thanks in advance!

Since the RGB to graylevel conversion operation is purely local, a streaming approach is trivial; the position of the pixels is irrelevant. Watershed is a global operation. One pixel can change the output dramatically. You have several options:
Write an implementation of Watershed that works on tiles and iterates on many passes through the image. This sounds difficult to me.
Use a local method to segment (i.e. thresholding).
Get a computer with more RAM. RAM is cheap and you can stick tons of it into a desktop system.

Related

Python: Recognize if image contains graphic/text or a picture

I want to write a script, that converts unknown images (jpg, png, gif, bmp, tiff, etc.) to a specific resolution and format as well as generating a thumbnail.
the problem is that the compression level, that is totally fine for pictures produces crap for exports of Presentations for example; So I want to differ the conversion settings based on the contents of the image.
Does anyone have experience in doing that kind of stuff in python (or shell scripts whose output is easily pasreable)?
my ideas are:
increase contrast and check histogramm if there are only single spikes left
doing a high pass filtering of the image and check what?
doing face recognition of known letters
the goal is that the recognition should be quite fast (approx. 10 images/second) and quite easy to implement

This is a pretty trivial machine learning problem, I would research the MNIST dataset problems that teach you how to recognize handwritten characters, this process should be very similar. Check out this tutorial and see if you can modify it to recognize graphics vs pictures. If your error rate ends up too high you'll have to try more advanced machine learning techniques.
http://mxnet.io/tutorials/python/mnist.html

how to calculate depth of object in image captured by android camera

I have an image captured by android camera. Is it possible to calculate depth of object in the image ? Image contains object and background only. Any suggestion, explanation or links that you think can help me will be appreciated.

OpenCV is the library you need.
I did some depth identification of water levels in pure white background a few days ago. Generally, if you want to identify the depth, you can convert the question to identify the edge of the changing colors. In this case, you can convert the colorful pictures to grey and identify the changing of while-black-grey interface. OpenCV is capable of doing the job at high speed.
Hope it helps. Let me know if you need further help.
Edits:
If you want to find the actual depths, you need to project the coordinate system of your pictures to the real world, or vice versa. To do it, you have to know a fix location as your reference and the relationship between pixels and real distances.
What I did is find the fixed location and set it as zero. Afterwards, I measured a length of an object in the picture, and also calculated the pixel amount of the object. Therefore I obtained the relationship between pixels and real distances.
Note that these procedures may involve errors in the identification. I did it very carefully and the error was acceptable in my case.

With only one image, accurate depth estimation is near impossible. However, there are various methods of estimating depth under certain assumptions or the availability of the camera calibration matrix. As mentioned by #WenlongLiu, OpenCV is a very good place to start with.

Align text for OCR

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR each page. Since the data in these books does not come in well formatted tables, I need to segment each page into rows and columns and then OCR each piece separately.
One of the critical steps is to align the text in the image.
For example, this is a typical page that needs to be aligned:
A solution I found is to smudge the text horizontally (I'm using skimage.ndimage.morphology.binary_dilation) and find the rotation that maximizes the sum of white pixels along the horizontal dimension.
This works fine, but it takes about 8 seconds per page, which given the volume of pages I am working with, is way too much.
Do you know of a better, faster way of accomplishing aligning the text?
Update:
I use scikit-image for image processing functions, and scipy to maximize the count of white pixels along the horizontal axis.
Here is a link to an html view of the Jupyter notebook I used to work on this. The code uses some functions from a module I've written for this project so it cannot be run on its own.
Link to notebook (dropbox): https://db.tt/Mls9Tk8s
Update 2:
Here is a link to the original raw image (dropbox): https://db.tt/1t9kAt0z

Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.
You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:
I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:
and simply take the columnwise mean:
The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:
Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.
Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)
Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.

This is not a full solution but there is more than a comment's worth of thoughts.
You have a margin on the left and right and top and bottom of your image. If you remove that, and even cut into the text in the process, you will still have enough information to align the image. So, if you chop, say 15%, off the top, bottom, left and right, you will have reduced your image area by 50% already - which will speed things up down the line.
Now take your remaining central area, and divide that into, say 10 strips all of the same height but the full width of the page. Now calculate the mean brightness of those strips and take the 1-4 darkest as they contain the most (black) lettering. Now work on each of those in parallel, or just the darkest. You are now processing just the most interesting 5-20% of the page.
Here is the command to do that in ImageMagick - it's just my weapon of choice and you can do it just as well in Python.
convert scan.jpg -crop 300x433+64+92 -crop x10# -format "%[fx:mean]\n" info:
0.899779
0.894842
0.967889
0.919405
0.912941
0.89933
0.883133 <--- choose 4th last because it is darkest
0.889992
0.88894
0.888865
If I make separate images out of those 10 stripes, I get this
convert scan.jpg -crop 300x433+64+92 -crop x10# m-.jpg
and effectively, I do the alignment on the fourth last image rather than the whole image.
Maybe unscientific, but quite effective and pretty easy to try out.
Another thought, once you have your procedure/script sorted out for straightening a single image, do not forget you can often get massive speedup by using GNU Parallel to harass all your CPU's lovely, expensive cores simultaneously. Here I specify 8 processes to run in parallel...
#!/bin/bash
for ((i=0;i<100000;i++)); do
ProcessPage $i
done | parallel --eta -j 8

"align the text in the image" I suppose means to deskew the image so that text lines have the same baseline.
I thoroughly enjoyed reading scientific answers to this quite overengineered task. Answers are great, but is it really necessary to spend so much time (very precious resource) to implement this? There is an abundance of tools available for this function without needing to write a single line of code (unless OP is a CS student and wants to practice the science, but obviously OP is doing this out of necessity to get all images processed). These methods took me back to my college years, but today I would use different tools to process this batch quickly and efficiently, which I do daily. I work for a high-volume document conversion and data extraction service bureau and OCR consulting company.
Here is the result of a basic open and deskew step in ABBYY FineReader commercial desktop OCR package. Deskewing was more than sufficient for further OCR processing.
And I did not need to recreate and program my own browser just to post this answer.

OCR of low-resolution text from screenshots

I'm writing an OCR application to read characters from a screenshot image. Currently, I'm focusing only on digits. I'm partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.
I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:
Top: a target digit that I successfully extracted from a screenshot
Middle: the template: a digit from my training set
Bottom: the error (absolute difference) between the top and middle images
The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).
You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits -- for example, it's not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.
Currently, I'm handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn't resolve the problem (false positives on other digits).
I'm a bit reluctant to perform matching using a shifted template (it'd be essentially the same as what I'm doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover's_distance) in 2D: basically, I need a comparison method that isn't as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).
Can anybody suggest a more effective matching method than absolute difference?
I'm doing all this in OpenCV using the C-style Python wrappers (import cv).

I would look into using Haar cascades. I've used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough '2's, '3's, '4's, and so on.
http://alereimondo.no-ip.org/OpenCV/34
http://en.wikipedia.org/wiki/Haar-like_features

OCR on noisy images is not easy - so simple approaches no not work well.
So, I would recommend you to use HOG to extract features and SVM to classify. HOG seems to be one of the most powerful ways to describe shapes.
The whole processing pipeline is implemented in OpenCV, however I do not know the function names in python wrappers. You should be able to train with the latest haartraining.cpp - it actually supports more than haar - HOG and LBP also.
And I think the latest code (from trunk) is much improved over the official release (2.3.1).
HOG usually needs just a fraction of the training data used by other recognition methods, however, if you want to classify shapes that are partially ocludded (or missing), you should make sure you include some such shapes in training.

I can tell you from my experience and from reading several papers on character classification, that a good way to start is by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and it turns out that OpenCV already includes excellent implementations on PCAs and SVMs. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website.
Another library for Python that I recommend you is "scikits.learn". It is very easy to send cvArrays to scikits.learn and run machine learning algorithms on your data. A basic example for OCR using SVM is here.
Another more complicated example using manifold learning for handwritten character recognition is here.

PIL vs Python-GD for crop and resize

I am creating custom images that I later convert to an image pyramid for Seadragon AJAX. The images and image pyramid are created using PIL. It currently take a few hours to generate the images and image pyramid for approximately 100 pictures that have a combined width and height of about 32,000,000 by 1000 (yes, the image is very long and narrow). The performance is roughly similar another algorithm I have tried (i.e. deepzoom.py). I plan to see if python-gd would perform better due to most of its functionality being coded in C (from the GD library). I would assume a significant performance increase however I am curious to hear the opinion of others. In particular the resizing and cropping is slow in PIL (w/ Image.ANTIALIAS). Will this improve considerable if I use Python-GD?
Thanks in advance for the comments and suggestions.
EDIT: The performance difference between PIL and python-GD seems minimal. I will refactor my code to reduce performance bottlenecks and include support for multiple processors. I've tested out the python 'multiprocessing' module. Results are encouraging.

PIL is mostly in C.
Antialiasing is slow. When you turn off antialiasing, what happens to the speed?

VIPS includes a fast deepzoom creator. I timed deepzoom.py and on my machine I see:
$ time ./wtc.py
real 0m29.601s
user 0m29.158s
sys 0m0.408s
peak RES 450mb
where wtc.jpg is a 10,000 x 10,000 pixel RGB JPG image, and wtc.py is using these settings.
VIPS is around three times faster and needs a quarter of the memory:
$ time vips dzsave wtc.jpg wtc --overlap 2 --tile-size 128 --suffix .png[compression=0]
real 0m10.819s
user 0m37.084s
sys 0m15.314s
peak RES 100mb
I'm not sure why sys is so much higher.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.