Good morning everybody! I created a physically accurate scene in Blender and my aim, using python, is to study radiometric conditions over the rendered scene in order to obtain an illumination map in terms of [W/m^2]. I saved images in OpenEXR file format, due to its high dynamic-range properties and I wanted to obtain a Relative Luminance Map starting from RGB values in "R" "G" "B" channels. The major issue is how to scale linear values from OpenEXR channels to have physically accurate values between [0, 1], needed to obtain Relative Luminance map maintaining the HDR properties of the file format. Part of the code is reported below.
pt = Imath.PixelType(Imath.PixelType.FLOAT)
exrfile = exr.InputFile(filename)
dw = exrfile.header()['dataWindow']
size = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
redstr = exrfile.channel('R', pt)
red = np.fromstring(redstr, dtype = np.float32)
red.shape = (size[1], size[0]) # Numpy arrays are (row, col)
greenstr = exrfile.channel('G', pt)
green = np.fromstring(greenstr, dtype = np.float32)
green.shape = (size[1], size[0]) # Numpy arrays are (row, col)
bluestr = exrfile.channel('B', pt)
blue = np.fromstring(bluestr, dtype = np.float32)
blue.shape = (size[1], size[0]) # Numpy arrays are (row, col)
rel_luminance = 0.2126*red[:,:]+0.7152*green[:,:]+0.0722*blue[:,:]
For a test image the obtained Max values of the three channels are respectively:
Max(R) = 198.16421508789062
Max(G) = 173.5792999267578
Max(B) = 163.20120239257812
The obtained values are obviously not in the range between [0, 1], moreover I am not able to understand the global maximum value to scale the channels and obtain what i want.
Has someone some tips to solve my problem? Thanks in advice.
A few points…
RGB is tristimulus information. It will never be “radiometric” but rather radiometric-like representations within the limitations of tristimulus.
There is no limitation on the upper or lower limits in an EXR tristimulus encoding. The meaning comes from the ratios between the values, or an additional piece of information in the rare case the units are intended to be absolute.
A good rule of thumb is that any time the term lum is in a word, it is photometric (IE human-centric domain) where rad is likely radiometric. Illuminance, luminance etc. are photometric massaged values,while irradiance and radiance are the physical model side.
Calculating achromatic luminance from an RGB triplet is a weighted sum of components. For BT.709 based sRGB tristimulus systems, that weighting is 0.2126 * R + 0.7152 * G + 0.0722 * B. Again, note this is an approximation based off of the CIE 1920 luminous efficacy function. Also note, luminance does not adequately represent the cumulative equivalent achromatic luminance contribution.
Related
I have a fixed camera mounted on a wall viewing a rectangular lawn at an angle. My goal is to obtain an undistorted, top-down view of the lawn.
I have an image from the camera as a python numpy array which looks like this:
raw camera image
I use an inverse matrix with skimage.transform.warp to correct the image to a top down view:
top down distorted
This works perfectly, however the camera lens introduces barrel distortion.
Seperately, I can correct the distortion with a generated lookup table using skimage.transform.warp_coords and passing a simple undistort callable function based on the algorithm described here.
The image is then generated using scipy.ndimage.map_coordinates:
undistorted camera view
These 2 processes work individually, but how do I combine them to create an undistorted top-down view, without creating an intermediate image?
I could run each point in the lookup table through the matrix to create a new table, but the table is massive and memory is tight (Raspberry Pi Zero).
I would like to define the undistortion as a matrix and just combine the 2 matrices, but as I understand it, the projective homography matrix is linear but undistortion is non-linear, so this can't be done. I can't use OpenCV due to resource constraints, and the calibration procedure involving multiple chessboard images is impractical. Currently, I calibrate by taking 4 lawn corner points and generate the matrix from them, which works well.
I would have anticipated that this is a common problem in Computer Vision but can't find any suitable solutions.
The barrel distortion is nonlinear, but it is also smooth. This means it can be well approximated by a collection of piecewise linear approximations.
So you do not need a large, per-pixel look-up table of un-distortion displacements. Rather, you can subsample it (or just scale it down), and use bilinear interpolation for in-between pixels.
I have found a solution that appears to work by creating seperate functions for undistort and transformation, then chaining them together.
The skimage source code here has the _apply_mat method for generating a mapping from a matrix. I based my unwarp function on that:
def unwarp(coords, matrix):
coords = np.array(coords, copy=False, ndmin=2)
x, y = np.transpose(coords)
src = np.vstack((x, y, np.ones_like(x)))
dst = src.T # matrix.T
# below, we will divide by the last dimension of the homogeneous
# coordinate matrix. In order to avoid division by zero,
# we replace exact zeros in this column with a very small number.
dst[dst[:, 2] == 0, 2] = np.finfo(float).eps
# rescale to homogeneous coordinates
dst[:, :2] /= dst[:, 2:3]
return dst[:, :2]
I created a similar function for undistorting based on Tanner Hellands algorithm:
def undistort(coords, cols, rows, correction_radius, zoom):
half_width = cols / 2
half_height = rows / 2
new_x = coords[:, 0] - half_width
new_y = coords[:, 1] - half_height
distance = np.hypot(new_x, new_y)
r = distance / correction_radius
theta = np.ones_like(r)
# only process non-zero values
np.divide(np.arctan(r), r, out=theta, where=r!=0)
source_x = half_width + theta * new_x * zoom
source_y = half_height + theta * new_y * zoom
result = np.column_stack([source_x, source_y])
return result
The only tricky bit here is the divide where we need to prevent division by zero.
Once we have each lookup table we can chain them together:
def undistort_unwarp(coords):
undistorted = undistort(coords)
both = unwarp(undistorted)
return both
Note that these are the callable functions passed to skimage.transform.warp_coords:
mymap = tf.warp_coords(undistort_unwarp, shape=(rows, cols), dtype=np.int16)
The map can then be passed to the skimage.transform.warp function.
Francesco's answer was helpful, however I needed the full pixel resolution for the transformation, so I used it for the undistort as well, and looked to other ways to reduce the memory consumption.
Each map consumes
rows * cols * bytes-per-item * 2 (x and y)
bytes. The default datatype is float64, which requires 8 bytes-per-item, and the documentation suggests sane choices would be the default or float32 at 4 bytes-per-item. I was able to reduce this to 2 bytes-per-item using int16 with no visible ill effects, but I suspect the spline interpolation is not being used to the full (at all?).
The map is the same for each channel of a colour RGB image. However, when I called warp_coords with shape=(rows, cols, 3) I got 3 duplicate maps, so I created a function to handle colour images by processing each channel separately:
def warp_colour(img_arr, coord_map):
if img_arr.ndim == 3:
# colour
rows, cols, _chans = img_arr.shape
r_arr = tf.warp(img_arr[:, :, 0], inverse_map=coord_map, output_shape=(rows, cols))
g_arr = tf.warp(img_arr[:, :, 1], inverse_map=coord_map, output_shape=(rows, cols))
b_arr = tf.warp(img_arr[:, :, 2], inverse_map=coord_map, output_shape=(rows, cols))
rgb_arr = np.dstack([r_arr, g_arr, b_arr])
else:
# grayscale
rows, cols = img_arr.shape
rgb_arr = tf.warp(img_arr, inverse_map=coord_map, output_shape=(rows, cols))
return rgb_arr
One issue with skimage.transform.warp_coords is that it does not have the map_args dictionary parameter that skimage.transform.warp has. I had to call my unwarp and undistort functions through an intermediate function to add the parameters.
The type of my train_data is 'Array of unit 16'. The size is (96108,7,7). Therefore, there are 96108 images.
The image is different from the general image. My image has a sensor of 7x7 and 49 pixels contain the number of detected lights. And one image is the number of light detected for 0 to 1 second. Since the sensor detects randomly for a unit time, the maximum values of the pixel are all different.
If the max value of all images is 255, I can do 'train data/255', but I can't use the division because the max value of the image I have is all different.
I want to make the pixel value of all images 0 to 1.
What should I do?
Contrast Normalization (or contrast stretch) should not be confused with Data Normalization which maps data between 0.0-1.0.
Data Normalization
We use the following formula to normalize data. The min() and max() values are the possible minimum and maximum values supported within the type of data.
When we use it with images, x is the whole image and i is an individual pixel of that image. If you are using an 8-bit image the min() and max() values become 0 and 255 respectively. This should not be confused with the minimum and maximum values presented within your image in question.
To convert an 8-bit image into a floating-point image, As min() value reaches 0, the simple math is image/255.
img = img/255
NumPy methods likes to output arrays in 64-bit floating-point by default. To effectively test methods applied to 8-bit images with NumPy, an 8-bit array is required as the input:
image = np.random.randint(0,255, (7,7), dtype=np.uint8)
normalized_image = image/255
When we examine the output of the above two lines we can see the maximum value of the image is 252 which has now mapped to 0.9882352941176471 on the 64-bit normalized image.
However, in most cases, you wouldn't need a 64-bit image. You can output (or in other words cast) it to 32-bit (or 16-bit) using the following code. If you try to cast it to 8-bit it will throw an error. Using '/' for division is a shorthand for np.true_divide but lacks the ability to define the output data format.
normalized_image_2 = np.true_divide(image, 255, dtype=np.float32)
The properties of the new array is shown below. You can see the number of digits are now reduced and 252 has been remapped to 0.9882353.
Contrast Normalization
The method shown by #3dSpatialUser effectively does a partial contrast normalization, meaning it stretches the intensities of the image within the available intensity range. Test it with an 8-bit array with the following code.
c_image = np.random.randint(64,128, (7,7), dtype=np.uint8)
cn_image = (c_image - c_image.min()) / (c_image.max()- c_image.min())
Contrast is now stretched mapping minimum contrast of 64 to 0.0 and maximum 127 to 1.0.
The formula for contrast normalization is shown below.
Using the above formula with NumPy and to remap data back to the 8-bit input format after contrast normalization, the image should be multiplied by 255, then change the data type back to unit8:
cn_image_correct = (c_image - c_image.min()) / (c_image.max()- c_image.min()) * 255
cn_image_correct = cn_image_correct.astype(np.int8)
64 is now mapped to 0 and 174 is mapped to 255 stretching the contrast.
Where the confusion arise
In most applications, the intensity values of an image are spread close to their minima and maxima. Hence, when we apply the normalization formula using the min and max values presented within the image, instead of the min max of the available range, it will output a better looking image (in most cases) within the 0.0-1.0 range, which effectively does normalize both data and contrast at the same time. Also, image editing software perform gamma corrections or remapping when switching between image data types 8/16/32-bits.
import numpy as np
data = np.random.normal(loc=0, scale=1, size=(96108, 7, 7))
data_min = np.min(data, axis=(1,2), keepdims=True)
data_max = np.max(data, axis=(1,2), keepdims=True)
scaled_data = (data - data_min) / (data_max - data_min)
EDIT: I have voted for the other answer since that is a cleaner way (in my opinion) to do it, but the principles are the same.
EDIT v2: I saw the comment and I see the difference. I will rewrite my code so it is "cleaner" with less extra variables but still correct using min/max:
data -= data.min(axis=(1,2), keepdims=True)
data /= data.max(axis=(1,2), keepdims=True)
First the minimum value is moved to zero, thereafter one can take the maximum value to get the full range (max-min) of the specific image.
After this step np.array_equal(data, scaled_data) = True.
You can gather the maximum values with np.ndarray.max across multiple axes: here axis=1 and axis=2 (i.e. on each image individually). Then normalize the initial array with it. To avoid having to broadcast this array of maxima yourself, you can use the keepdims option:
>>> x = np.random.rand(96108,7,7)
>>> x.max(axis=(1,2), keepdims=True).shape
(96108, 1, 1)
While x.max(axis=(1,2)) alone would have returned an array shaped (96108,)...
Such that you can then do:
>>> x /= x.max(axis=(1,2), keepdims=True)
I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?
With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.
For understanding purposes, I want to implement a stereo algorithm in Python (and Numpy), that computes a disparity map. As image data, I used the Tsukuba image dataset from Middlebury*. For simplicity, I choose normalised cross correlation (NCC)** as the similarity measure to find correspondence pixels. I will assume scanline agreement.
Here my implemented NCC:
left_mu = np.mean(left_patch)
right_mu = np.mean(right_patch)
left_sigma = np.sqrt(np.mean((left_patch - left_mu)**2))
right_sigma = np.sqrt(np.mean((right_patch - right_mu)**2))
patch = left_patch * right_patch
mu = left_mu * right_mu
num = np.mean(patch) - mu
denom = left_sigma * right_sigma
ncc = num/denom
where the left_patch and right_patch are some 3x3 patches from the original images. This outputs integers between -1 and 1, which describes the similarity between two pixels.
The idea is now to find the best-fit pixel. The disparity between the two pixels should now be stored in a new image - the disparity map.
Since I assumed scanline agreement I only have to search in one image row. For each pixel in the row, I want to take the index of the value that maximises the NCC value and store it as the disparity value.
My problem is now, that my results are rather odd. My disparity values are at around 180-200 pixels for an image which is 384x288 pixels. Here the resulting image.
Can you see the mistake in my thinking?
(*) vision.middlebury.edu/stereo/data/scenes2001/data/anigif/orig/tsukuba_o_a.gif
(**) A two-stage correlation method for stereoscopic depth estimation. - N. Einecke and J. Eggert
It seems that you didn't compute the numerator properly. It should be:
num = np.mean( (left_patch - left_mu) * (right_patch - right_mu) )
I want to extract HOG features of Line images of Arabic Handwriting. The code is as follows. So , I want help regarding how to input the image and how to output the features . Can anyone please help me regarding this.
import numpy as np
from scipy import sqrt, pi, arctan2, cos, sin
from scipy.ndimage import uniform_filter
def hog(image, orientations=9, pixels_per_cell=(8, 8),
cells_per_block=(3, 3), visualise=False, normalise=False):
"""Extract Histogram of Oriented Gradients (HOG) for a given image.
Compute a Histogram of Oriented Gradients (HOG) by
1. (optional) global image normalisation
2. computing the gradient image in x and y
3. computing gradient histograms
4. normalising across blocks
5. flattening into a feature vector
Parameters
----------
image : (M, N) ndarray
Input image (greyscale).
orientations : int
Number of orientation bins.
pixels_per_cell : 2 tuple (int, int)
Size (in pixels) of a cell.
cells_per_block : 2 tuple (int,int)
Number of cells in each block.
visualise : bool, optional
Also return an image of the HOG.
normalise : bool, optional
Apply power law compression to normalise the image before
processing.
Returns
-------
newarr : ndarray
HOG for the image as a 1D (flattened) array.
hog_image : ndarray (if visualise=True)
A visualisation of the HOG image.
References
----------
* http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients
* Dalal, N and Triggs, B, Histograms of Oriented Gradients for
Human Detection, IEEE Computer Society Conference on Computer
Vision and Pattern Recognition 2005 San Diego, CA, USA
"""
image = np.atleast_2d(image)
"""
The first stage applies an optional global image normalisation
equalisation that is designed to reduce the influence of illumination
effects. In practice we use gamma (power law) compression, either
computing the square root or the log of each colour channel.
Image texture strength is typically proportional to the local surface
illumination so this compression helps to reduce the effects of local
shadowing and illumination variations.
"""
if image.ndim > 3:
raise ValueError("Currently only supports grey-level images")
if normalise:
image = sqrt(image)
"""
The second stage computes first order image gradients. These capture
contour, silhouette and some texture information, while providing
further resistance to illumination variations. The locally dominant
colour channel is used, which provides colour invariance to a large
extent. Variant methods may also include second order image derivatives,
which act as primitive bar detectors - a useful feature for capturing,
e.g. bar like structures in bicycles and limbs in humans.
"""
gx = np.zeros(image.shape)
gy = np.zeros(image.shape)
gx[:, :-1] = np.diff(image, n=1, axis=1)
gy[:-1, :] = np.diff(image, n=1, axis=0)
"""
The third stage aims to produce an encoding that is sensitive to
local image content while remaining resistant to small changes in
pose or appearance. The adopted method pools gradient orientation
information locally in the same way as the SIFT [Lowe 2004]
feature. The image window is divided into small spatial regions,
called "cells". For each cell we accumulate a local 1-D histogram
of gradient or edge orientations over all the pixels in the
cell. This combined cell-level 1-D histogram forms the basic
"orientation histogram" representation. Each orientation histogram
divides the gradient angle range into a fixed number of
predetermined bins. The gradient magnitudes of the pixels in the
cell are used to vote into the orientation histogram.
"""
magnitude = sqrt(gx ** 2 + gy ** 2)
orientation = arctan2(gy, (gx + 1e-15)) * (180 / pi) + 90
sy, sx = image.shape
cx, cy = pixels_per_cell
bx, by = cells_per_block
n_cellsx = int(np.floor(sx // cx)) # number of cells in x
n_cellsy = int(np.floor(sy // cy)) # number of cells in y
# compute orientations integral images
orientation_histogram = np.zeros((n_cellsy, n_cellsx, orientations))
for i in range(orientations):
#create new integral image for this orientation
# isolate orientations in this range
temp_ori = np.where(orientation < 180 / orientations * (i + 1),
orientation, 0)
temp_ori = np.where(orientation >= 180 / orientations * i,
temp_ori, 0)
# select magnitudes for those orientations
cond2 = temp_ori > 0
temp_mag = np.where(cond2, magnitude, 0)
orientation_histogram[:,:,i] = uniform_filter(temp_mag, size=(cy, cx))[cy/2::cy, cx/2::cx]
# now for each cell, compute the histogram
#orientation_histogram = np.zeros((n_cellsx, n_cellsy, orientations))
radius = min(cx, cy) // 2 - 1
hog_image = None
if visualise:
hog_image = np.zeros((sy, sx), dtype=float)
if visualise:
from skimage import draw
for x in range(n_cellsx):
for y in range(n_cellsy):
for o in range(orientations):
centre = tuple([y * cy + cy // 2, x * cx + cx // 2])
dx = radius * cos(float(o) / orientations * np.pi)
dy = radius * sin(float(o) / orientations * np.pi)
rr, cc = draw.bresenham(centre[0] - dx, centre[1] - dy,
centre[0] + dx, centre[1] + dy)
hog_image[rr, cc] += orientation_histogram[y, x, o]
"""
The fourth stage computes normalisation, which takes local groups of
cells and contrast normalises their overall responses before passing
to next stage. Normalisation introduces better invariance to illumination,
shadowing, and edge contrast. It is performed by accumulating a measure
of local histogram "energy" over local groups of cells that we call
"blocks". The result is used to normalise each cell in the block.
Typically each individual cell is shared between several blocks, but
its normalisations are block dependent and thus different. The cell
thus appears several times in the final output vector with different
normalisations. This may seem redundant but it improves the performance.
We refer to the normalised block descriptors as Histogram of Oriented
Gradient (HOG) descriptors.
"""
n_blocksx = (n_cellsx - bx) + 1
n_blocksy = (n_cellsy - by) + 1
normalised_blocks = np.zeros((n_blocksy, n_blocksx,
by, bx, orientations))
for x in range(n_blocksx):
for y in range(n_blocksy):
block = orientation_histogram[y:y + by, x:x + bx, :]
eps = 1e-5
normalised_blocks[y, x, :] = block / sqrt(block.sum() ** 2 + eps)
"""
The final step collects the HOG descriptors from all blocks of a dense
overlapping grid of blocks covering the detection window into a combined
feature vector for use in the window classifier.
"""
if visualise:
return normalised_blocks.ravel(), hog_image
else:
return normalised_blocks.ravel()
You can use the OpenCV library to read image files into NumPy arrays.