I am in the process of porting a code I wrote in IDL (interactive data language) to python but am running into a bit of a problem that I am hoping someone can help me with.
The code goes like this:
take individual classified Landsat geotiffs (say there are N individual 1-band files per scene, each representing a different day) and further reduce these images to three binary-themed 1-band images (water and not water, land and not land, water/land and not water/land). This will be done by reading the rasters as matrices and replacing values.
** I don't actually need to have these images, so I can save them as memory or just keep them as numpy ndarrays to move to the next step
stack these images/arrays to produce 3 different (1 for each 'element') N-band stacks (or a 3-dimensional array-- (samples, lines, N)) for each scene
total the stacks to get a total number of water/land/water&land observations per pixel (produces one 1-band total image for each scene)
other stuff
The problem I am running into is when I get to the stacking, as the individual images for each scene vary in size, although they mostly overlap with each other. I originally used an ENVI layer-stacking routine that takes the N different-sized 1-band images for each scene and stacks them into an N-band image with an extent that encompasses all of the images' extents, and then reading the resulting rasters in as 3-d arrays to do the totals. I would like to do something similar with gdal/python but am not sure how to go about doing so. I was thinking I would implement gdal capabilities of geotiffs by using the geotransform info of the images to somehow find the inclusive extent, possibly padding the edges of the images with 0's so they are all the same size, stacking these images/3-d arrays so that they are correctly aligned, then computing the totals. Hopefully there is something more direct in gdal (or in any other open source package for python), as I'm not sure how I would pull that off.
Does anyone have any suggestions or ideas as to what would be the most efficient way (or any way really), to do what I need to do? I'm open to anything.
Thanks so much,
Maggie
Related
I'm very new to image processing in Python (and not massively adept at python in general), so forgive me for how stupid this may sound. Im working with an AI for object detection, and need to submit 1000x1000 pixel images to it, that have been divided up from larger images of varying lengths and widths (not necessarily divisible, but I have a way of padding out images less than 1000x1000). In order for this to work, I need 200 pixel overlap on each segment or the AI will pick may miss objects.
I've tried a host of methods, and have either got the image to divide up using the methods suggested in Creating image tiles (m*n) of original image using Python and Numpy and how can I split a large image into small pieces in python (plus a few others that effectively do the same techniques in different words. I've been able to make a grid and get the tile names from this, using How to determine coordinate of grid elements of an image, however have not been able to get overlap to work in this, as I would then just tile it normally.
Basically what I'm saying is that I've found one way to cut the images up that works, and one way to get the tile coordinates, but I am utterly failing at putting it all together. Does anyone have any advice on what to do here?
So far I've not found a direct approach to my end goal online - and I've tried mucking around with different scripts (like the ones listed above), but feel like Im barking up totally the wrong tree.
Im trying to remove the differences between two frames and keep the non-chaning graphics. Would probably repeat the same process with more frames to get more accurate results. My idea is to simplify the frames removing things that won't need to simplify the rest of the process that will do after.
The different frames are coming from the same video so no need to deal with different sizes, orientation, etc. If the same graphic its in another frame but with a different orientation or scale, I would like to also remove it. For example:
Image 1
Image 2
Result (more or less, I suppose that will be uglier but containing a similar information)
One of the problems of this idea is that the source video, even if they are computer generated graphics, is compressed so its not that easy to identify if a change on the tonality of a pixel its actually a change or not.
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
During the last couple of days I have tried to achieve it in a Python script by using OpenCV with all kinds of combinations of absdiffs, subtracts, thresholds, equalizeHists, canny but so far haven't found the right implementation and would appreciate any guidance. How would you achieve it?
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
This will be extremely hard. You would need to employ proper CV and if you're not an expert in that field, you'll have really hard time.
How about this, forgetting about tooling and libs, you have two images, ie. two equally sized sequences of RGB pixels. Image A and Image B, and the output image R. Allocate output image R of the same size as A or B.
Run a single loop for every pixel, read pixel a and from A and pixel b from B. You get a 3-element (RGB) vector. Find distance between the two vectors, eg. magnitude of a vector (b-a), if this is less than some tolerance, write either a or b to the same offset into result image R. If not, write some default (background) color to R.
You can most likely do this with some HW accelerated way using OpenCV or some other library, but that's up to you to find a tool that does what you want.
I am trying to perform image registration on potentially hundreds of aerial images taken from a camera mounted on a UAV. I think it is safe to assume that I know the ordering of the images, and hopefully, sequential images will overlap.
I have read some papers that suggest using a CNN to find the homography matrix can vastly outperform the old school feature descriptor matching with RANSAC song and dance. My issue is that I don't quite understand how to stitch more than 2 images together. It seems to me that to register image 100 in the same coordinate frame as image 1 using the cv2.warpPerspective function, I would do I100H1H2*H3...H99. Even if the error in each transform is small after 100 applications it seems like it would be huge. My understanding is that the solution to this problem is bundle adjustment.
I have looked into bundle adjustment a little bit but Im struggling to see how exactly I can use it. I have read the paper that many related stack overflow posts suggest "Automatic Panoramic Image Stitching using Invariant Features". In the section on bundle adjustment IF I understand the authors suggest that after building the initial panorama it is likely that image A will eventually overlap with multiple other images. Using the matched feature points in any images that overlap with A they basically calculate some adjustment...? I think to image A?
My question is using openCV how do I apply this adjustment? Let's say I have 3 images I1, I2, I3 all overlapping for a minimal example.
#assuimg CNN model predicts transform
#I think the first step is find the homography between all images
H12 = cnnMod.predict(I1,I2)
H13 = cnnMod.predict(I1,I3)
H23 = cnnMod.predict(I2,I3)
outI2 = cv2.warpPerspective(I2,H12,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
outI3 = cv2.warpPerspective(I2,H23,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
#now would I do some bundle voodoo?
#what would it look like?
#which of the bundler classes should I use?
#would it look like this?
#or maybe the input is features?
voodoo = cv2.bundleVoodoo([H12,H13,H23])
golaballyRectifiedI2 = cv2.warpPerspective(outI2,voodoo[2],(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
The code is my best guess at what a solution might look like but clearly I have no idea what I am doing. I've not been able to find anything that actually shows how the bundle adjustment is done.
The basic idea underlying image alignment through bundle adjustment is that, rather than matching pairs of 2D points (x, x') across pairs of images, you posit the existence of 3d points X that, ideally, project onto matched tuples of 2D points (x, x', x'', ...) matched among corresponding tuples of images. You then solve for the location of the X's and the camera parameters (extrinsics, and intrinsics if the camera is uncalibrated) that minimize the (robustified, usually) RMS reprojection error over all 2d points and images.
Depending on your particular setup and scene, you may make some simplifying assumptions, e.g.:
That the X's all belong to the same plane (which you can arbitrarily choose as the world's Z=0 plane). This is useful, for example, when stitching images of a painting, or aerial images on relatively flat ground with relatively small extent so one can ignore the earth's curvature.
Or that the X's are all on the WGS84 ellipsoid.
Both the above assumptions remove one free coordinate from X, effectively reducing the problem's dimensionality.
I have a camera in a fixed position looking at a target and I want to detect whether someone walks in front of the target. The lighting in the scene can change so subtracting the new changed frame from the previous frame would therefore detect motion even though none has actually occurred. I have thought to compare the number of contours (obtained by using findContours() on a binary edge image obtained with canny and then getting size() of this) between the two frames as a big change here could denote movement while also being less sensitive to lighting changes, I am quite new to OpenCV and my implementations have not been successful so far. Is there a way I could make this work or will I have to just subtract the frames. I don't need to track the person, just detect whether they are in the scene.
I am a bit rusty but there are various ways to do this.
SIFT and SURF are very expensive operations, so I don't think you would want to use them.
There are a couple of 'background removal' methods.
Average removal: in this one you get the average of N frames, and consider it as BG. This is vulnerable to many things, light changes, shadow, moving object staying at a location for long time etc.
Gaussian Mixture Model: a bit more advanced than 1. Still vulnerable to a lot of things.
IncPCP (incremental principal component pursuit): I can't remember the algorithm totally but basic idea was they convert each frame to a sparse form, then extract the moving objects from sparse matrix.
Optical flow: you find the change across the temporal domain of a video. For example, you compare frame2 with frame1 block by block and tell the direction of change.
CNN based methods: I know there are a bunch of them, but I didn't really follow them. You might have to do some research. As far as I know, they often are better than the methods above.
Notice that, for a #30Fps, your code should complete in 33ms per frame, so it could be real time. You can find a lot of code available for this task.
There are a handful of ways you could do this.
The first that comes to mind is doing a 2D FFT on the incoming images. Color shouldn't affect the FFT too much, but an object moving, entering/exiting a frame will.
The second is to use SIFT or SURF to generate a list of features in an image, you can insert these points into a map, sorted however you like, then do a set_difference between the last image you took, and the current image that you have. You could also use the FLANN functionality to compare the generated features.
I am currently working on a summer research project and we have generated 360 slices of a tumor. I now need to compile (if that's the right word) these images into one large 3D image. Is there a way to do this with either a python module or an outside source? I would prefer to use a free software if that is possible.
Perhaps via matplotlib, but anyway may require preprocessing I suppose:
https://www.youtube.com/watch?v=5E5mVVsrwZw
In your case, the z axis (3rd dimension) should be specified by your vector of images. Nonetheless, before proceeding, I suppose you would need to extract the shapes of the object you want to reconstruct. For instance, if i take any image of the many 2D you have, I expect to find RGB value for each pixel, but then, for instance if you want to plot a skull like in the video link, as I understand you would need to extract the borders of your object and from each of its frame (2D shape) and then plot their serie. But anyway, the processing may depend on the encoding of the information you have. Perhaps is sufficient to simply plot the series of images.
Some useful link I found:
https://www.researchgate.net/post/How_to_reconstruct_3D_images_from_two_or_four_2D_images
Python: 3D contour from a 2D image - pylab and contourf