Python sliding window for images to be used in CNN - python

This is something that I'm sure there must be a package out there for, but for the life of me I can not find it.
I've trained a CNN with a given size image looking for heads and then wish to give it a larger image to look within for the heads. Now the way this is typically done is a sliding window, a sub image is made from the larger image, and then we slide the bounding box of the sub image through the larger image, often with significant overlap (maybe 50%). Also since the thing I'm looking for might be bigger or smaller than in the training data I need to start with a huge bounding box (say 3 times the size of the training images) scale it down to the training size, slide over the image, then try again with something 2.5x the size, then 2, 1.5,1,0.75,0.5,0.25 etc.
It's not too complicated for me to write on my lonesome, but my implementation will be slow and messy. There must be an python package that does this. What's it called?!

Related

Remove differences between two video frames

Im trying to remove the differences between two frames and keep the non-chaning graphics. Would probably repeat the same process with more frames to get more accurate results. My idea is to simplify the frames removing things that won't need to simplify the rest of the process that will do after.
The different frames are coming from the same video so no need to deal with different sizes, orientation, etc. If the same graphic its in another frame but with a different orientation or scale, I would like to also remove it. For example:
Image 1
Image 2
Result (more or less, I suppose that will be uglier but containing a similar information)
One of the problems of this idea is that the source video, even if they are computer generated graphics, is compressed so its not that easy to identify if a change on the tonality of a pixel its actually a change or not.
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
During the last couple of days I have tried to achieve it in a Python script by using OpenCV with all kinds of combinations of absdiffs, subtracts, thresholds, equalizeHists, canny but so far haven't found the right implementation and would appreciate any guidance. How would you achieve it?
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
This will be extremely hard. You would need to employ proper CV and if you're not an expert in that field, you'll have really hard time.
How about this, forgetting about tooling and libs, you have two images, ie. two equally sized sequences of RGB pixels. Image A and Image B, and the output image R. Allocate output image R of the same size as A or B.
Run a single loop for every pixel, read pixel a and from A and pixel b from B. You get a 3-element (RGB) vector. Find distance between the two vectors, eg. magnitude of a vector (b-a), if this is less than some tolerance, write either a or b to the same offset into result image R. If not, write some default (background) color to R.
You can most likely do this with some HW accelerated way using OpenCV or some other library, but that's up to you to find a tool that does what you want.

What type of input does ResNet need?

I am new to deep learning, and I am trying to train a ResNet50 model to classify 3 different surgical tools. The problem is that every article I read tells me that I need to use 224 X 224 images to train ResNet, but the images I have are of size 512 X 288.
So my questions are:
Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools are positioned rather randomly inside the image, and I think cropping the image will cut off part of the tools as well.
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I wonder if I must only use images that only have one tool appearing at a time.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?
Thank you.
Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools
are positioned rather randomly inside the image, and I think cropping
the image will cut off part of the tools as well.
Yes you can train ResNet without cropping your images. you can resize them, or if that's not possible for some reason, you can alter the network, e.g. add a global pooling at the very end and account for the different input sizes. (you might need to change kernel sizes, or downsampling rate).
If your bigest issue here is that resnet requires 224x224 while your images are of size 512x228, the simplest solution would be to first resize them into 224x224. only if that`s not a possibility for you for some technical reasons, then create a fully convolutional network by adding a global pooling at the end.(I guess ResNet does have a GP at the end, in case it does not, you can add it.)
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
For classification no, you do not. having a bounding box for an object is only needed if you want to do detection (that's when you want your model to also draw a rectangle around the objects of interest.)
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I
wonder if I must only use images that only have one tool appearing at
a time.
3.Its ok to have multiple different objects in one image, as long as they do not belong to different classes that you are training against. That is, if you are trying to classify apples vs oranges, its obvious that, an image can not contain both of them at the same time. but if for example it contains anything else, a screwdriver, key, person, cucumber, etc, its fine.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?
It depends on your model. cropping and image size are two different things. you can crop an image of any size, and yet resize it to your desired dimensions. you usually want to have all images with the same size, as it makes your life easier, but its not a hard condition and based on your requirements you can have varying images, etc as well.

Create a collage of images on a defined area

I want to create a collage of images on a defined area. Since I have a big number of images, I am looking for a algorithm that can solve this problem. The goal of this algorithm should be to maximize the area that is covered by the images.
There are also two rules that should be met:
1.) It is allowed to resize the images but only proportionally (avoid ‚squeezing‘, lock the aspect ratio).
2.) There must be a maximum and minimum height and width (in order to avoid that some photos are unproportionally big compared to others and to prevent the algorithm from shrinking a photo to a size where you can‘t see the image anymore).
I have also a two (optional) goals that should be solved by the algorithm:
3.) The images should have as much contact to the borders as possible.
4.) I am not not able to define the second goal algorithmically, so please excuse my loose language here:The algorithm should try to create a ‚pretty‘ distribution of the images. For example, one could agree that the second collage looks prettier than the first one, because there is a more harmonic ratio between the number of 'uncovered-area-shapes' and their size. Also, in contrast to the first example, the uncovered-area-shapes in the second example take up the shape of rectangles which makes the whole image look 'more calm':

Find Coordinates of cropped image (JPG) from it's original

I have a database of original images and for each original images there are various cropped versions.
This is an example of how the image look like:
Original
Horizontal Crop
Square Crop
This is a very simple example, but most images are like this, some might taken a smaller section of the original image than others.
I was looking at OpenCV in python but I'm very new to this kind of image processing.
The idea is to be able to save the cropping information separate from the image to save space and then generate all the cropping and different aspect ratio on the fly with a cache system instead.
The method you are looking for is called "template matching". You find examples here
https://docs.opencv.org/trunk/d4/dc6/tutorial_py_template_matching.html
For your problem, given the large images, it might be a good idea to constrain the search space by resizing both images by the same factor. So that searching a position that isn't as precise, but allows then to constrain the actual full pixel sized search to a smaller region around that point.

georeference/stack geotiffs of different sizes using python/gdal

I am in the process of porting a code I wrote in IDL (interactive data language) to python but am running into a bit of a problem that I am hoping someone can help me with.
The code goes like this:
take individual classified Landsat geotiffs (say there are N individual 1-band files per scene, each representing a different day) and further reduce these images to three binary-themed 1-band images (water and not water, land and not land, water/land and not water/land). This will be done by reading the rasters as matrices and replacing values.
** I don't actually need to have these images, so I can save them as memory or just keep them as numpy ndarrays to move to the next step
stack these images/arrays to produce 3 different (1 for each 'element') N-band stacks (or a 3-dimensional array-- (samples, lines, N)) for each scene
total the stacks to get a total number of water/land/water&land observations per pixel (produces one 1-band total image for each scene)
other stuff
The problem I am running into is when I get to the stacking, as the individual images for each scene vary in size, although they mostly overlap with each other. I originally used an ENVI layer-stacking routine that takes the N different-sized 1-band images for each scene and stacks them into an N-band image with an extent that encompasses all of the images' extents, and then reading the resulting rasters in as 3-d arrays to do the totals. I would like to do something similar with gdal/python but am not sure how to go about doing so. I was thinking I would implement gdal capabilities of geotiffs by using the geotransform info of the images to somehow find the inclusive extent, possibly padding the edges of the images with 0's so they are all the same size, stacking these images/3-d arrays so that they are correctly aligned, then computing the totals. Hopefully there is something more direct in gdal (or in any other open source package for python), as I'm not sure how I would pull that off.
Does anyone have any suggestions or ideas as to what would be the most efficient way (or any way really), to do what I need to do? I'm open to anything.
Thanks so much,
Maggie

Categories

Resources