I am studying image segmentation and have got some my simulation results.
I wonder how to generate an image as the attached one.
This figure is illustrated by the authors of the paper “Learning deep features from discriminative localization” whose well-known concept is class activation map (CAM) (Fig. 1 in the paper).
The images in first row are the input images and the images in second row are the output images.
In order to generate the output, the another input, an mask image, might be required.
The value of each pixel of the mask image ranges from 0 to 1.
The pixel whose value in mask is 1 will be colored red and 0 will be blue.
I tried to find the method to do that but I have no idea what keyword I should use.
I also tried to find the way to compute the values of the output images but it was also ambiguous.
I thought the output can be generated by simple interpolation.
For example, if the value of mask is 1, I thought the output could be an average of original and [255,original,original] (RGB representation).
Can this be simply done using python?
The method indeed can be interpolation but I cannot find the exact values.
Any link of keyword to find out the way will be greatly thanksful.
it looks like the "heatmap" is just being drawn with fixed transparency on top of the photo
to answer this I'd:
create a 2d (x,y) activation map; each value is the "activation" level at that pixel
transform this into a 3d (x,y,rgb) image through a color map; each value is turned into a RGB triple, e.g. 0=blue, 1=red in your example. I'd also suggest using a different colormap, the example is particularly bad with respect to perceptual uniformity
use PIL to blend the map on top of the original
hope that helps!
Related
I am struggling to get my KNN object detection script to tell the difference between two almost identical objects.
The obvious solution, which I am doing now is to add more reference images.
I was wondering if there is anything else I could do to easier distinguish between the two objects. They are both beverage containers with the same shape and colors, where the first has a small blueish tint and the other is just plain white with a shiny finish.
Right now I am feeding the image of my objects to VGG19 and capture the feature map output of the 1x4096 flatten layer and adding to this the five most dominant colors in RGB format along the the pixel height and width of the image crop of the object. This gives me a 4113 length one dimensional vector on which I can perform the KNN search.
I am curious if things like saturation and contrast on the image will help the KNN search finding the small differences or if there is something else I could try?
Best regards
Martin
I hope you're all doing well!
I'm new to Image Manipulation, and so I want to apologize right here for my simple question. I'm currently working on a problem that involves classifying an object called jet into two known categories. This object is made of sub-objects. My idea is to use this sub-objects to transform each jet in a pixel image, and then applying convolutional neural networks to find the patterns.
Here is an example of the pixel images:
jet's constituents pixel distribution
To standardize all the images, I want to find the two most intense pixels and make sure the axis connecting them is in the vertical direction, as well as make sure that the most intense pixel is at the top. It also would be good to impose that one of the sides (left or right) of the image contains the majority of the intensity and to normalize the intensity of the whole image to 1.
My question is: as I'm new to this kind of processing, I don't know if there is a library in Python that can handle these operations. Are you aware of any?
PS: the picture was taken from here:https://arxiv.org/abs/1407.5675
You can look into OpenCV library for Python:
https://docs.opencv.org/master/d6/d00/tutorial_py_root.html.
It supports a lot of image processing functions.
In your case, it probably would be easier to convert the image into a more suitable color space in which one axis stands for color intensity (e.g HSI, HSL, HSV) and trying to find indices of the maximum values along this axis (this should return the pixels with the highest intensity in the image).
Generally, in Python, we use PIL library for basic manipulations with images and OpenCV for advances ones.
But, if understand your task correctly, you can just think of an image as a multidimensional array and use numpy to manipulate it.
For example, if your image is stored in a variable of type numpy.array called img, you can find maximum value along the desired axis just by writing:
img.max(axis=0)
To normalize image you can use:
img /= img.max()
To find which image part is brighter, you can split an img array into desired parts and calculate their mean:
left = img[:, :int(img.shape[1]/2), :]
right = img[:, int(img.shape[1]/2):, :]
left_mean = left.mean()
right_mean = right.mean()
Just as a project, I wanted to see if it would be possible to predict the next pixel in an image given all previous pixels.
For example: lets say I have an image with x pixels. Given the first y pixels, I want to be able to somewhat accurately predict the y+1th pixel. How should I go about solving this problem.
You are looking for some kind of generative model. RNNs are commonly used, and there's a great blog post here demonstrating character-by-character text generation.
The same principle can be applied to any ordered sequence. You talk about an image as being a sequence of pixels, but images have an intrinsic 2D structure (3 if you include color) that would be lost if you took the exact same approach as text generation. A couple of ideas:
Use tensorflow's GridLSTMCells
Treat a column of pixels as a single element of the sequence and predict column-by-column (or row by row)
Combine idea 2 with some 1D convolutions along the column/row
Use features from a section of an image as the seed to a generative adversarial network. See this repository for basic implementations.
I have an Image (or several hundreds of them) that need to be analyzed. The goal is to find all black spots close to each other.
For example all black spots with a Horizontal distance of 160 pixel and vertical 40 pixel.
For now I just look at each Pixel and if there is a black pixel I call a recursive Method to find its neighbours (i can post the code too if you want to)
It works, but its very slow. At the moment the script runs about 3-4 minutes depending on image size.
Is there some easy/fast way to accomplish this (best would be a scikit-image method to help out here) I'm using Python.
edit: I tried to use scikit.measure.find_contours, now i have an array with arrays containing the contours of the black spots. Now I only need to find the contours in the neighbourhood of these contours.
When you get the coordinates of the different black spots, rather than computing all distances between all pairs of black pixels, you can use a cKDTree (in scipy.spatial, http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.html#scipy.spatial.cKDTree). The exact method of cKDTree to use depends on your exact criterion (you can for example use cKDTree.query_ball_tree to know whether there exists a pair of points belonging to two different labels, with a maximal distance that you give).
KDTrees are a great method to reduce the complexity of problems based on neighboring points. If you want to use KDTrees, you'll need to rescale the coordinates so that you can use one of the classical norms to compute the distance between points.
Disclaimer: I'm not proficient with the scikit image library at all, but I've tackled similar problems using MATLAB so I've searched for the equivalent methods in scikit, and I hope my findings below help you.
First you can use skimage.measure.label which returns label_image, i.e. an image where all connected regions are labelled with the same number. I believe you should call this function with background=255 because from your description it seems that the background in your images is the while region (hence the value 255).
This is essentially an image where the background pixels are assigned the value 0 and the pixels that make up each (connected) spot are assigned the value of an integer label, so all the pixels of one spot will be labelled with the value 1, the pixels of another spot will be labelled with the value 2, and so on. Below I'll refer to "spots" and "labelled regions" interchangeably.
You can then call skimage.measure.regionprops, that takes as input the label_image obtained in the previous step. This function returns a list of RegionProperties (one for each labelled region), which is a summary of properties of a labelled region.
Depending on your definition of
The goal is to find all black spots close to each other.
there are different fields of the RegionProperties that you can use to help solve your problem:
bbox gives you the set of coordinates of the bounding box that contains that labelled region,
centroid gives you the coordinates of the centroid pixel of that labelled region,
local_centroid gives you the centroid relative to the bounding box bbox
(Note there are also area and bbox_area properties which you can use to decide whether to throw away very small spots that you might not be interested in, thus reducing computation time when it comes to comparing proximity of each pair of spots)
If you're looking for a coarse comparison, then comparing the centroid or local_centroid between each pair of labelled regions might be enough.
Otherwise you can use the bbox coordinates to measure the exact distance between the outer bounds of any two regions.
If you want to base the decision on the precise distance between the pixel(s) of each pair of regions that are closest to each other, then you'll likely have to use the coords property.
If your input image is binary, you could separate your regions of interest as follows:
"grow" all the regions by the expected distance (actually half of it, as you grow from "both sides of the gap") with binary_dilation, where the structure is a kernel (e.g. rectangular: http://scikit-image.org/docs/dev/api/skimage.morphology.html#skimage.morphology.rectangle) of, let's say, 20x80pixels;
use the resulting mask as an input to skimage.measure.label to assign different values for different regions' pixels;
multiply your input image by the mask created above to zero dilated pixels.
Here are the results of proposed method on your image and kernel = rectange(5,5):
Dilated binary image (output of step 1):
Labeled version of the above (output of step 2):
Multiplication results (output of step 3):
I want to overlay two views of the same scene - one is a white-light image (monochrome, used for reference) and the other is an image in a specific band (that has the real data I'm showing).
The white-light image is "reference", the data image is "data". They're ordinary 2D numpy arrays of identical dimensions. I want to show the white reference image using the 'gray' color map, and the data image using the 'hot' color map.
What is the "proper" way to do this?
I started with this:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
hotm = cm.ScalarMappable(cmap='hot')
graym = cm.ScalarMappable(cmap='gray')
ref_rgb = graym.to_rgba(reference) # rgba reference image, 'gray' color map
data_rgb = hotm.to_rgb(data) # rgba data image, 'hot' color map
plt.imshow(ref_rgb + data_rgb)
That didn't work well because in the plt.imshow() call the sum overflowed the range 0..1 (or maybe 0..255; this is confusing) and gave me crazy colors.
Then I replaced the last line with this:
plt.imshow(ref_rgb/2 + data_rgb/2)
That worked, but gives me a very washed-out, low-contrast image.
Finally, I tried this:
plt.imshow(np.maximum(ref_rgb, data_rgb))
That seems to give the best result, but I'm worried that much of my "data" is lost by having lower r, g, or b values than the reference image.
What is the "proper", or "usual" way to do this?
I'm not exactly sure what you're trying to achieve, but hopefully this will give you some ideas. :)
I've never used matplotlib, but from a quick look at the docs, it looks like matplotlib.cm gives you the option to have the pixel data as integers in the 0..255 range or as floats in the 0.0..1.0 range. The float format is more convenient for arithmetic image processing, so I'll assume that's the case in the rest of this answer.
We can do basic image processing by doing simple arithmetic on the RGB pixel values. Roughly speaking, adding (or subtracting) a constant to the RGB value of all your pixels changes the image brightness, multiplying your pixels by a constant changes the image contrast, and raising your pixels to a constant (positive) power changes the image gamma. Of course, you do need to make sure that these operations don't cause the colour values to go out of range. That's not a problem for gamma adjustment, or contrast adjustment (assuming the constant is in the 0.0..1.0 range), but it can be a problem for brightness modification. More subtle brightness & contrast modification can be achieved by suitable combinations of addition and multiplication.
When doing this sort of thing it's often a Good Idea to normalize the pixel values in your image data to the 0.0..1.0 range, either before &/or after you've done your main processing.
Your code above is essentially treating the grey reference data as a kind of mask and using its values, instead of using a constant, to operate on the colour data pixel by pixel. As you've seen, taking the mean of ref_rgb & data_rgb results in a washed-out image because you are reducing the contrast. But see what happens when you multiply ref_rgb & data_rgb: contrast will generally be increased because dark areas in ref_rgb will darken the corresponding pixels in data_rgb but bright areas in ref_rgb will leave the corresponding pixels in data_rgb virtually untouched.
ImageMagick has some nice examples of arithmetic image processing.
Another thing to try is to convert your data_rgb to HSV format, and replace the V (value) data with the greyscale data from ref_rgb. And you can do similar tricks with the S (saturation) data, although the effect is generally a bit subtler.