image super resolution by stacking multiple captures of a photo using python - python

I'm new to image processing and there's a method I found on super resolution using photoshop. The tutorial for the procedure is found here: http://photoncollective.com/enhance-practical-superresolution-in-adobe-photoshop.
Basically, the procedure goes as follows:
Take multiple captures of an image (say 20 shots) and stack them as 20 layers on top of one another.
Resampling (procedure done here uses nearest neighbor sampling)
Change the opacity of each layer from bottom to top such that the opacity = 1/(layer number). For example, if you have 20 layers, make the bottom 1/1 = 100%, the second from the top should be 1/2 = 50%, the third 1/3=33%, the fourth 1/4=25% and so on until the top layer which is 1/20 = 5%.
Flatten
Sharpen image using a filter.
My question this how do I do this same procedure in OpenCV4 in python? I'm familiar with how to do #5 for one image to get a sharper image. My question is mainly how to do procedures 1-4.
Any answer would be much appreciated. I'm even okay with just hints and not a full answer. Oh, and I'm not sure if I'm using the term correctly but by stacking, what I mean is to put one picture on top of another just how layers work in photoshop

Related

Recursive rectangle sub division

I'm really curious about this image and I have little to no information how it was created. Thus, I'm here to research how to do it.
Can someone tell me where to begin? I only know this problem might be related to a recursive subdivision task.
I can only see the images was divided into 64 blocks initially.
There is some color simplification going on which I don't understand how to achieve this and am curious about how to do it.
A reference to an algorithm or procedure is enough(python/c++ only please)
You could have squares of the average color of that part of the image, check how similar it is to the original using something like image similarity measures and if its not good enough, subdivide into 4 squares and make them each the average color of that part of the image. Repeat this until every square in the image is good enough.
With help of Google Images I was able to find the name of the person who is in the image: Kenny Cason. With some more research I was able to find the answer.
The problem is related to Quad Tree Images:
Partition the image into four quadrants.
Color each quadrant based on the average color of the pixels in the
target image.
Compute each quadrant's squared error between the original target
image and the generated image.
Select the quadrant with the highest error and recur into it.
Repeat from step one, using the current highest error quadrant.
GitHub link.
Kenny Cason's blog .

OpenCV: What can cause a mostly black stereovision disparity map?

I have been dipping my toes into OpenCV and the stereovision functions it contains, and am struggling to get good results while following instructions in both the OpenCV documentation and many articles online. Specifically, I believe that at this point I have managed to obtain a decent calibration of my cameras, a decent stereo calibration, and even a decent rectification, but when moving to create the disparity map I seem to get nonsense back.
I am using a set of self-acquired images taken with a Pentax K-3 ii camera using a Loreo Lens-in-a-cap CCD splitter which gives me "two" images taken on one CCD. I can then split the image in half (and trim some of the pixels near the overlap) to have a reliable baseline distance in world coordinates with the camera. I unfortunately have no information on the true focal length of this configuration but I would guess it is around 9cm.
I have performed camera calibration on each split-image set to get camera matrices, distance coefficients, and object and image points for use in epipolar geometry. Then, following the procedure laid out in [1,2], perform stereo calibration and rectification. I do not have the required reputation to embed images, so please click here. By my understanding, the fact that similar features in both images are similar distances to the true horizontal lines I have drawn across them means that this is a good rectification result and should be usable.
However, when I implement the following code to create the disparity map:
# Settings for cv.StereoSGBM_create
minDisparity = 1
numDisparities = 64
blockSize = 1
disp12MaxDiff = 1
uniquenessRatio = 10
speckleWindowSize = 0
speckleRange = 8
stereo = cv.StereoSGBM_create(minDisparity=minDisparity, numDisparities=numDisparities, blockSize=blockSize, disp12MaxDiff=disp12MaxDiff, uniquenessRatio=uniquenessRatio,
speckleWindowSize=speckleWindowSize, speckleRange=speckleRange)
# Calculate the disparity map
disp = stereo.compute(imgL, imgR).astype(np.float32)
# Normalize the values to spread them across the viewable range
disp = cv.normalize(disp,0,255,cv.NORM_MINMAX)
# Resize for display
disp = cv.resize(disp, (1000,1000))
cv.imshow("disparity",disp)
cv.waitKey(0)
The result is disheartening. Intuitively, seeing a lot of black space surrounding edges which actually are fairly well-defined (such as in the chessboard pattern or near my hands) would suggest that there is very little disparity. However it seems clear to me that the images are quite different in terms of translation, so I am a bit confused. I have been delving through the documentation and run out of ideas. I tried reusing the code that produced the initial set of epipolar lines provided here which seemed to work on the original image quite nicely. However, it produces epipolar lines which are certainly not horizontal. This tells me that something is wrong, but I do not understand what could be, especially given the "visual test" I described above. I suspect I am misapplying that section of the code.
One thought I have is that I need to use an ROI to select the valid parts of the image, but I am unsure how to go about this. I think this is supported by the odd streaking behavior at the right edge of the left image post-rectification.
This is a link to a pastebin of all of my code, aside from the initial camera calibration which has significant runtime due to the size of the images.
I would appreciate any help that can be offered as at this point I am going a bit codeblind. I am limited to only 8 links due to my reputation, so please let me know if I can provide better images or documentation of my work.

Create a collage of images on a defined area

I want to create a collage of images on a defined area. Since I have a big number of images, I am looking for a algorithm that can solve this problem. The goal of this algorithm should be to maximize the area that is covered by the images.
There are also two rules that should be met:
1.) It is allowed to resize the images but only proportionally (avoid ‚squeezing‘, lock the aspect ratio).
2.) There must be a maximum and minimum height and width (in order to avoid that some photos are unproportionally big compared to others and to prevent the algorithm from shrinking a photo to a size where you can‘t see the image anymore).
I have also a two (optional) goals that should be solved by the algorithm:
3.) The images should have as much contact to the borders as possible.
4.) I am not not able to define the second goal algorithmically, so please excuse my loose language here:The algorithm should try to create a ‚pretty‘ distribution of the images. For example, one could agree that the second collage looks prettier than the first one, because there is a more harmonic ratio between the number of 'uncovered-area-shapes' and their size. Also, in contrast to the first example, the uncovered-area-shapes in the second example take up the shape of rectangles which makes the whole image look 'more calm':

How to classify an image according to two other given images

I am trying to identify a state of a valve(on or off). My approach is to give to images of each states and compare the current image with those two and see which one it belongs to.
I have tried to compare using new_image - on_image and new_image - off_image. Then compare the number of different pixels. It works, but i feel like in some cases it might not work and there must be another better way do a simple classification like this.
Any reference or ideas?
Subtracting pixels might not be very robust in case your camera position changes slightly. If you don't shy away from using open Computer Vision (open CV) there is an interesting recipe for finding a predefined object in a picture:
Feature Matching + Homography to find Objects
You could cut out the lever from your image and search it in every new image. Depending on the coordinates and especially the rotation, you can set the status of the valve. This might even work in crazy cases where someone half opened (or for pessimists: half closed) the valve, or if the lever becomes partially covered.

Align text for OCR

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR each page. Since the data in these books does not come in well formatted tables, I need to segment each page into rows and columns and then OCR each piece separately.
One of the critical steps is to align the text in the image.
For example, this is a typical page that needs to be aligned:
A solution I found is to smudge the text horizontally (I'm using skimage.ndimage.morphology.binary_dilation) and find the rotation that maximizes the sum of white pixels along the horizontal dimension.
This works fine, but it takes about 8 seconds per page, which given the volume of pages I am working with, is way too much.
Do you know of a better, faster way of accomplishing aligning the text?
Update:
I use scikit-image for image processing functions, and scipy to maximize the count of white pixels along the horizontal axis.
Here is a link to an html view of the Jupyter notebook I used to work on this. The code uses some functions from a module I've written for this project so it cannot be run on its own.
Link to notebook (dropbox): https://db.tt/Mls9Tk8s
Update 2:
Here is a link to the original raw image (dropbox): https://db.tt/1t9kAt0z
Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.
You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:
I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:
and simply take the columnwise mean:
The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:
Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.
Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)
Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.
This is not a full solution but there is more than a comment's worth of thoughts.
You have a margin on the left and right and top and bottom of your image. If you remove that, and even cut into the text in the process, you will still have enough information to align the image. So, if you chop, say 15%, off the top, bottom, left and right, you will have reduced your image area by 50% already - which will speed things up down the line.
Now take your remaining central area, and divide that into, say 10 strips all of the same height but the full width of the page. Now calculate the mean brightness of those strips and take the 1-4 darkest as they contain the most (black) lettering. Now work on each of those in parallel, or just the darkest. You are now processing just the most interesting 5-20% of the page.
Here is the command to do that in ImageMagick - it's just my weapon of choice and you can do it just as well in Python.
convert scan.jpg -crop 300x433+64+92 -crop x10# -format "%[fx:mean]\n" info:
0.899779
0.894842
0.967889
0.919405
0.912941
0.89933
0.883133 <--- choose 4th last because it is darkest
0.889992
0.88894
0.888865
If I make separate images out of those 10 stripes, I get this
convert scan.jpg -crop 300x433+64+92 -crop x10# m-.jpg
and effectively, I do the alignment on the fourth last image rather than the whole image.
Maybe unscientific, but quite effective and pretty easy to try out.
Another thought, once you have your procedure/script sorted out for straightening a single image, do not forget you can often get massive speedup by using GNU Parallel to harass all your CPU's lovely, expensive cores simultaneously. Here I specify 8 processes to run in parallel...
#!/bin/bash
for ((i=0;i<100000;i++)); do
ProcessPage $i
done | parallel --eta -j 8
"align the text in the image" I suppose means to deskew the image so that text lines have the same baseline.
I thoroughly enjoyed reading scientific answers to this quite overengineered task. Answers are great, but is it really necessary to spend so much time (very precious resource) to implement this? There is an abundance of tools available for this function without needing to write a single line of code (unless OP is a CS student and wants to practice the science, but obviously OP is doing this out of necessity to get all images processed). These methods took me back to my college years, but today I would use different tools to process this batch quickly and efficiently, which I do daily. I work for a high-volume document conversion and data extraction service bureau and OCR consulting company.
Here is the result of a basic open and deskew step in ABBYY FineReader commercial desktop OCR package. Deskewing was more than sufficient for further OCR processing.
And I did not need to recreate and program my own browser just to post this answer.

Categories

Resources