Generating probabilites from patches of image - python

I am working with an image of size 512x512. The image is divided into patches using einops with patch size of 32. The number of patches overall is 256, in other words, we get a new "image" of size 256x1024.
Since this image is actually a mask for a segmentation problem, the image is actually comprised of only 4 values (4 classes): 0 for the background, 1 for the first class, 2 for the second class, 3 for the third class.
My goal is to take every patch, and compute for every class C the following:
Number of pixels in this patch / Number of pixels labeled C.
This should give me an array of size 4 where the first entry is the total number of pixels in the patch (1024) over the number of background pixels (labeled as 0), the second, third and fourth entries are the same but for the corresponding class.
In theory, I know that I need to iterate over every single patch and then count how many pixels of each class exists in the current patch, then divide by 1024. Doing this 256 yields exactly what I want. The problem is that I have a (very) large amount of images that I need to do this for, and the size of 512 is just an example to make the question simpler, therefore a for loop is out of question.
I know that I can get the result that I want using numpy. I tried both: numpy.apply_over_axes and numpy.apply_along_axis but I don't know which one is better suited for this task, also there is numpy.where which I don't know how it applies here.
Here is what I did:
from einops import rearrange
import numpy as np
labn = np.random.randint(4,size= (512,512)) # Every pixels in this image is of value: 0,1,2,3
to_patch = rearrange(labn, "(h p1) (w p2) -> (h w) (p1 p2)", p1=32, p2=32)
print(to_patch.shape) # (256,1024)
c0 = np.full(1024, 0)
c1 = np.full(1024, 1)
c2 = np.full(1024, 2)
c3 = np.full(1024, 3)
def f(a):
_c0 = a == c0
_c1 = a == c2
_c2 = a == c2
_c3 = a == c3
pr = np.array([np.sum(_c0), np.sum(_c1), np.sum(_c2), np.sum(_c3)]) / 1024
return pr
resf = np.apply_along_axis(f, 1, to_patch)
print(resf.shape) # (256, 4, 1024)
Two things:
I want the output to be 256x4 where every array along the second axes sums to one.
Is there a faster/better/pythonic way to do this, preferably vectorized?
EDIT: I forgot to add the sum, so now I do get 256x4.

There is a built-in function to count occurrences called torch.histc, it is similar to Python's collections.Counter.
torch.histc(input, bins=100, min=0, max=0, *, out=None) → Tensor
Computes the histogram of a tensor.
The elements are sorted into equal width bins between min and max. If
min and max are both zero, the minimum and maximum values of the data
are used.
Elements lower than min and higher than max are ignored.
You need to specify the number of bins, here the number of classes C. As well as the min and max values for ordering. Also, it won't work with multi-dimensional tensors as such the resulting tensor will contain global statistics of the input tensor regardless of dimensions. As a possible workaround, you can iterate through your patches, calling torch.histc each time, then stacking the results and normalizing:
resf = torch.stack([torch.histc(patch, C, min=0, max=C-1) for patch in x]) / x.size(1)

Related

NumPy Nearest Neighbor Line Fitting Across Moving Window

I have two two-dimensional arrays loaded into NumPy, both of which are 80i x 80j in size. I'm looking to do a moving window polyfit calculation across these arrays, I've nailed down how to conduct the polyfit but am stuck on the specific moving window approach I'm looking to accomplish. I'm aiming for:
1) At each index of Array 1 (a1), code isolates all the values of the index & its closest 8 neighbors into a separate 1D array, and repeats over the same window at Array 2 (a2)
2) With these two new arrays isolated, perform linear regression line-fitting using NumPy's polyfit in the following approach: model = np.polyfit(a1slice, a2slice, 1)
3) Cast the resulting regression coefficient and intercept (example output doing this manually: array([-0.02114911, 10.02127152])) to the same index of two other arrays, where model[0] would be placed into the first new array and model[1] into the second new array at this index.
4) The code then moves sequentially to the next index and performs steps 1-3 again, or a1(i+1, j+0, etc.)
I've provided a graphical example of what I'm trying achieve for two random index selections across Array 1 and the calculation across the index's eight nearest neighbors, should this make the desired result easier to understand:
I've written a function to get the window, but there is a troublesome edge case when the index is on the edge and one distance away from the corner. You'll probably want to modify the function to get exactly what you're looking for.
def get_neighbors(arr, origin, num_neighbors = 8):
coords = np.array([[i,j] for (i,j),value in np.ndenumerate(arr)]).reshape(arr.shape + (2,))
distances = np.linalg.norm(coords - origin, axis = -1)
neighbor_limit = np.sort(distances.ravel())[num_neighbors]
window = np.where(distances <= neighbor_limit)
exclude_window = np.where(distances > neighbor_limit)
return window, exclude_window, distances
test = np.zeros((5, 5))
plt.close('all')
cases = [[0,0], [0, 1], [0, 2]]
for case in cases:
window, exclude, distances = get_neighbors(test, case)
distances[exclude] = 0
plt.figure()
plt.imshow(distances)
Image Outputs:

Numpy array - Two unknown dimensions - png files

I have a dataset existing of a bunch of png files with different sizes regarding their heights and widths.
I read in this files with the following code to get a numpy array. In this case, it is 2D. But actually I want to get a 3D array which exists of the number of images n, the height of the images h and the width w.
import os.path
import glob
import numpy as np
def open_images(images_directory):
pattern_to_match = os.path.join(images_directory, "*.png")
png_files = (x for x in glob.iglob(pattern_to_match)
if os.path.isfile(x))
for current_png_filename in png_files:
print("Opening file", current_png_filename)
with open(current_png_filename, "rb") as current_png_file:
data = current_png_file.read()
return np.frombuffer(data, dtype=np.uint8, offset=16)\
.reshape(-1, 3)\
.astype(np.float32)
pass
directory_to_search = r"C:\Users\tobis\OneDrive\Desktop\Masterarbeit\data\2017-IWT4S-HDR_LP-dataset\crop_h1"
open_images(directory_to_search)
At the moment, I get an array with a shpae like this:
(21559, 3). I think the first number is a combination of width and height and the last is the RGB value. I would like to get an array that looks like this one: (n, h, w).
Is there a way to get such an array? Unfortunately, I have two unknown dimensions. This seems to be the problem...
You can't just read an image file like that. You need to use a library to read it and interpret the height, width, colourspace, bits per pixel, date, the GPS data, the camera make and model and all the compressed, encoded pixels.
For example, with PIL/Pillow:
from PIL import Image
import numpy as np
# Open image and make sure it is RGB - not palette
im = Image.open('image.png').convert('RGB')
# Make into Numpy array
na = np.array(im)
# Check shape
print(na.shape) # prints (480,640,3) for height, width, channels
If you have a flattened image and would like to recover the original row and column dimensions you can apply a heuristic that tests various possible combinations and checks the "smoothness" of the image along the row axis. This smoothness can be checked via the mean squared error of consecutive rows for example. This assumes that the original image has some kind of structure, also along the row axis, and so the change in consecutive pixels in the original image will be relatively small when compare to other possible shapes.
For example let's say the original image is 155 x 79 pixels and it has been flattened into an array of 155 * 79 == 12245. The prime factorization of this is 5, 31, 79. So the possible row dimensions are all unique combinations of these prime factors, i.e. 5, 31, 79, 155, 395, 2449. Now these possible row dimensions, in the following referred to as estimates, can be sorted into two different categories:
Estimates which are a divisor of the original row dimension: 5, 31 and 155. This means effectively that multiple row-skipped copies of the original image are stacked next to each other. So the resulting image will retain the original column grouping. Since similar columns remain together each element of the stack will have roughly the same smoothness. For example if the estimate is 31 this means that the original shape 31 x 5 , 79 is transformed to 31 , 5 x 79, i.e. only every 5-th row of the original image is considered and five such copies are stacked next to each other. For the original image (i.e. an estimate of 155) length-1 correlations are considered (i.e. each pair of consecutive rows is compared), while for an estimate of 31 length-5 correlations are considered (i.e. comparing row-pairs that have another 4 rows between them). Since the original image is expected to have some smooth structure, the smoothness should decrease when longer ranges are compared. The decrease in smoothness will be bigger when the skip-range increases, but it can also completely vanish if the image contains some degree of periodicity along the row axis.
All other estimates: 79, 155, 395, 2449. For estimates of this category different columns of the original image are mixed in the test image corresponding to the estimate. For example if the estimate is 79 we have 155 % 79 == 76, i.e. each new row in the test image shifts the original columns by 3 with respect to the previous row. Assuming that the original image varies along the column dimension these shifts will introduce an increasingly strong deviation for the emerging consecutive rows. Since this column shift increases from row to row the resulting decrease in row-smoothness should be strong unless the number of rows is small. If the original image is column-periodic with the shift number of the estimate this can lead to a perfect agreement however.
So to summarize, if we compute the smoothness for all row dimension estimates we expect the smoothness to decrease for a wrong estimate and the decrease will be small if the estimate falls in category (1) and bigger if it falls in category (2).
Important: If the images are periodic along either the row or column dimension this can lead to a false estimate.
The implementation needs to cover the following steps:
Compute the prime factorization of the length of the flattened image.
Compute all unique row dimension estimates from combinations of the prime factors.
For each estimate compute the row-smoothness of the resulting test image. For example use the mean squared error of consecutive rows (actually this will be a non-smoothness score).
Find the best estimate from the scores.
Here is some example code for the implementation:
import itertools as it
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
image = np.array(Image.open('example.jpg'))
original_shape = image.shape[:2]
image = image.reshape(-1, 3)
def compute_prime_factors(n):
i = 2
while i <= n:
if n % i == 0:
n //= i
yield i
else:
i += 1
prime_factors = list(compute_prime_factors(len(image)))
combinations = it.chain.from_iterable(it.combinations(prime_factors, r=i) for i in range(1, len(prime_factors)))
row_dims = sorted({np.prod(x) for x in combinations})
def test_row_dim(r):
c = len(image) // r
test = image.reshape(r, c, 3)
return np.mean((test[1:] - test[:-1])**2)
scores = [test_row_dim(r) for r in row_dims]
best_estimate = row_dims[np.argmin(scores)]
fig, ax = plt.subplots()
ax.set(xlabel='row dimension', ylabel='score')
ax.set_xscale('log')
ax.plot(row_dims, scores, '-o', label='Estimations')
ax.plot([best_estimate], [np.min(scores)], '*', ms=12, label=f'Best Estimate ({best_estimate})')
ax.axvline(original_shape[0], label=f'Actual Dim ({original_shape[0]})', color='#2ca02c', zorder=-100, lw=1.5, ls='--')
ax.legend()
plt.figure()
plt.imshow(image.reshape(205, -1, 3)) # second best score
plt.show()
Let's test it on some image (H x W: 410 x 640):
Photo by Cameron Venti on Unsplash
This produces the following estimate scores:
The peaks to the left of the best estimate are the category (1) estimates that have the smallest row-skip. The prime factorization of 410 and 640 is 2*5*41 and 2**7 * 5 respectively. So the category (1) estimates that get closest to the original row dimension are 205, 82 and 41 (the side peaks from right to left). A decreasing estimate implies an increasing row-skip range and hence an increasing MSE score. The peak to the left of the best estimate corresponds to an estimate of 205, i.e. each second row gets skipped and hence two such row-skipped versions are stacked next to each other:
As you can imagine, by skipping every second row, the image doesn't change too much and the change is the same for the two side-by-side versions. Hence the small difference to the original image's MSE score.

Moving average produces array of different length?

This question has a lot of useful answers on how to get a moving average.
I have tried the two methods of numpy convolution and numpy cumsum and both worked fine on an example dataset, but produced a shorter array on my real data.
The data are spaced by 0.01. The example dataset has a length of 50, the real data tens of thousands. So it must be something about the window size that is causing the problem and I don't quite understand what is going on in the functions.
This is how I define the functions:
def smoothMAcum(depth,temp, scale): # Moving average by cumsum, scale = window size in m
dz = np.diff(depth)
N = int(scale/dz[0])
cumsum = np.cumsum(np.insert(temp, 0, 0))
smoothed=(cumsum[N:] - cumsum[:-N]) / N
return smoothed
def smoothMAconv(depth,temp, scale): # Moving average by numpy convolution
dz = np.diff(depth)
N = int(scale/dz[0])
smoothed=np.convolve(temp, np.ones((N,))/N, mode='valid')
return smoothed
Then I implement it:
scale = 5.
smooth = smoothMAconv(dep,data, scale)
but print len(dep), len(smooth)
returns 81071 80572
and the same happens if I use the other function.
How can I get the smooth array of the same length as the data?
And why did it work on the small dataset? Even if I try different scales (and use the same for the example and for the data), the result in the example has the same length as the original data, but not in the real application.
I considered an effect of nan values, but if I have a nan in the example, it doesn't make a difference.
So where is the problem, if possible to tell without the full dataset?
The second of your approaches is easy to modify to preserve the length, because numpy.convolve supports the parameter mode='same'.
np.convolve(temp, np.ones((N,))/N, mode='same')
This is made possible by zero-padding the data set temp on both sides, -
which will inevitably have some effect at the boundaries unless your data happens to be 0 near the boundaries. Example:
N = 10
x = np.linspace(0, 2, 100)
y = x**2 + np.random.uniform(size=x.shape)
y_smooth = np.convolve(y, np.ones((N,))/N, mode='same')
plt.plot(x, y, 'r.')
plt.plot(x, y_smooth)
plt.show()
The boundary effect of zero-padding is very visible at the right end, where the data points are about 4-5 but are padded by 0.
To reduce this undesired effect, use numpy.pad for more intelligent padding; reverting to mode='valid' for convolution. The pad width must be such that in total N-1 elements are added, where N is the size of moving window.
y_padded = np.pad(y, (N//2, N-1-N//2), mode='edge')
y_smooth = np.convolve(y_padded, np.ones((N,))/N, mode='valid')
Padding by edge values of an array looks much better.

Python fractal box count - fractal dimension

I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?
With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.

Fastest way to get bounding boxes around segments in a label map

A 3D label map is matrix in which every pixel (voxel) has an integer label. These values are expected to be contiguous, meaning that a segment with label k will not be fragmented.
Given such label map (segmentation), what is the fastest way to obtain the coordinates of a minimum bounding box around each segment, in Python?
I have tried the following:
Iterate through the matrix using multiindex iterator (from numpy.nditer) and construct a reverse index dictionary. This means that for every label you get the 3 coordinates of every voxel where the label is present.
For every label get the max and min of each coordinate.
The good thing is that you get all the location information in one O(N) pass. The bad thing is that I dont need this detailed information. I just need the extremities, so there might be a faster way to do this, using some numpy functions which are faster than so many list appends. Any suggestions?
The one pass through the matrix takes about 8 seconds on my machine, so it would be great to get rid of it. To give an idea of the data, there are a few hundred labels in a label map. Sizes of the label map can be 700x300x30 or 300x300x200 or something similar.
Edit: Now storing only updated max and min per coordinate for every label. This removes the need to maintain and store all these large lists (append).
If I understood your problem correctly, you have groups of voxels, and you would like to have the extremes of a group in each axis.
Let'd define:
arr: 3D array of integer labels
labels: list of labels (integers 0..labmax)
The code:
import numpy as np
# number of highest label:
labmax = np.max(labels)
# maximum and minimum positions along each axis (initialized to very low and high values)
b_first = np.iinfo('int32').min * np.ones((3, labmax + 1), dtype='int32')
b_last = np.iinfo('int32').max * np.ones((3, labmax + 1), dtype='int32')
# run through all of the dimensions making 2D slices and marking all existing labels to b
for dim in range(3):
# create a generic slice object to make the slices
sl = [slice(None), slice(None), slice(None)]
bf = b_first[dim]
bl = b_last[dim]
# go through all slices in this dimension
for k in range(arr.shape[dim]):
# create the slice object
sl[dim] = k
# update the last "seen" vector
bl[arr[sl].flatten()] = k
# if we have smaller values in "last" than in "first", update
bf[:] = np.clip(bf, None, bl)
After this operation we have six vectors giving the smallest and largest indices for each axis. For example, the bounding values along second axis of label 13 are b_first[1][13] and b_last[1][13]. If some label is missing, all corresponding b_first and b_last will be the maximum int32 value.
I tried this with my computer, and for a (300,300,200) array it takes approximately 1 sec to find the values.

Categories

Resources