Vectorized sliding / rolling numpy nanmean

Vectorized sliding / rolling numpy nanmean - python

I currently use this code to do a moving window average:
n=500
x_copy=np.hstack((np.full(n,np.nan),copy.deepcopy(x),np.full(n,np.nan)))
x_values=[]
for i in range(n,len(x)+n):
x_values.append(np.nanmean(x[i-n:i+n+1]))
plt.plot(x_values)
with x the array I'm working on and n is half the length of the window. However, I need to do this quickly, as I have to do roughly 4400*10 of this operation, with arrays around 60000 elements in length. After searching for a while, I found that np.convolve should work, so I have this code instead:
plt.plot(np.convolve(x, np.ones(((2*n),))/(2*n), mode='valid'),zorder=2)
While this is really fast, it's not doing exactly what I need it to do, as it seems to stop 500*2 units before the end of the array. For reference, here is the image of the plots of both of them: the blue is my own code, the orange is the convolution. I want to use convolve to speed up my moving window average, though I don't know how.
Reference

Set those NaNs to 0s and then use np.convolve on those masked versions -
# Window-size
W = 2*n+1
# Non-nans mask
m = ~np.isnan(x)
# "Masked" input array
x0 = np.where(m,x,0)
# Setup conv kernel and perform conv on x0 and mask m for the counts to divide
K = np.ones(W)
out = (np.convolve(x0,K)/np.convolve(m,K))[W-1:]

Related

Finding nearest pixel in defined color space - quick implementation using numpy

I have been working on a task, where I implemented median cut for image quantization – representing the whole image by only limited set of pixels. I implemented the algorithm and now I am trying to implement the part, where I assign each pixel to a representant from the set found by median cut. So, I have variable 'color_space', which is 2d ndarray of shape (n,3), where n is the number of representatives. Then I have variable 'img', which is the original image of shape (rows, columns, 3).
Now I want to find the nearest pixel (bin) for each pixel from the image based on euclidean distance. I was able to come with this solution:
for row in range(img.shape[0]):
for column in range(img.shape[1]):
img[row][column] = color_space[np.linalg.norm(color_space - img[row][column], axis=1).argmin()]
What it does is, that for each pixel from the image, it computes the vector if distances from each of the bins and then it takes the closest one.
Problem is, that this solution is quite slow and I would like to vectorize it - instead of getting vector for each pixel, I would like to get a matrix, where for example first row would be the first vector of distances computed in my code etc...
This problem could be converted into a problem, where I want to do a matrix multiplication, but instead of getting dot product of two vectors, I would get their euclidean distance. Is there some good approach to such problems? Some general solution in numpy, if we want to do 'matrix multiplication' in numpy, but the function Rn x Rn -> R does not need to be dot product, but for example euclidean distance. Of course, for the multiplication, the original image should be resized to (row*columns, 3), but that is a detail.
I have been studying the documentation and searching internet, but didn't find any good approach.
Please note that I don't want others to solve my assignment, the solution I came up with is totally ok, I am just curious whether I could speed it up as I try to learn numpy properly.
Thanks for any advices!

Below is MWE for vectorizing your problem. See comments for explanation.
import numpy
# these are just random array declaration to work with.
image = numpy.random.rand(32, 32, 3)
color_space = numpy.random.rand(10,3)
# your code. I modified it to pick indexes
result = numpy.zeros((32,32))
for row in range(image.shape[0]):
for column in range(image.shape[1]):
result[row][column] = numpy.linalg.norm(color_space - image[row][column], axis=1).argmin()
result = result.astype(numpy.int)
# here we reshape for broadcasting correctly.
image = image.reshape(1,32,32,3)
color_space = color_space.reshape(10, 1,1,3)
# compute the norm on last axis, which is RGB values
result_norm = numpy.linalg.norm(image-color_space, axis=3)
# now compute the vectorized argmin
result_vectorized = result_norm.argmin(axis=0)
print(numpy.allclose(result, result_vectorized))
Eventually, you can get the correct solution by doing color_space[result]. You may have to remove the extra dimensions that you add in color space to get correct shapes in this final operation.

I think this approach might be a bit more numpy-ish/pythonic:
import numpy as np
from typing import *
from numpy import linalg as LA
# assume color_space is defined as a constant somewhere above and is of shape (n,3)
nearest_pixel_idxs: Callable[[np.ndarray], int] = lambda rgb: return LA.norm(color_space - rgb, axis=1).argmin()
img: np.ndarray = color_space[np.apply_along_axis(nearest_pixel_idxs, 1, img.reshape((-1, 3)))]
Why this solution might be more efficient:
It relies on the parallelizable apply_along_axis function nearest_pixel_idxs() rather than the nested for-loops. This is made possible by reshaping img and thereby removing the need for double indexing.
It avoids repeated writes into color_space by only indexing into it once at the very end.
Let me know if you would like me to go into greater depth on any of this - happy to help.

You could first broadcast to get all the combinations and then calculate each norm. You could then pick the smallest from there.
a = np.array([[1,2,3],
[2,3,4],
[3,4,5]])
b = np.array([[1,2,3],
[3,4,5]])
a = np.repeat(a.reshape(a.shape[0],1,3), b.shape[0], axis = 1)
b = np.repeat(b.reshape(1,b.shape[0],3), a.shape[0], axis = 0)
np.linalg.norm(a - b, axis = 2)
Each row of the result represents the distance of the row in a to each of the representatives in b
array([[0. , 3.46410162],
[1.73205081, 1.73205081],
[3.46410162, 0. ]])
You can then use argmin to get the final results.
IMO it is better to use (what #Umang Gupta proposed) numpy's automatic broadcasting than using repeat.

Can this be done faster with numpy?

There's a color image, a numpy array of shape (h,w,3) with N=h*w pixels; there's an array labels of shape (h,w), each label an integer between 1 and M. N is 10^6-10^7, M is 10^3-10^4.
I need to produce
a result image (h,w,3) where the color of each pixel labelled l is the mean color of all pixels labelled l. I.e.:
def recolor1(image, labels):
result = np.empty(shape=(h,w,3))
for label in np.unique(labels):
mask = labels==label
mean = np.mean(image[mask], axis=0)
result[mask] = mean
return result
The code is straightforward, but runs in O(M.N) (the computation of mask is O(N) and the loop runs M times).
An O(N) recolor2 is possible. Basically you go over the labels and image pixels twice. First to compute an auxiliary array, indexed by label, where you keep the sums of each primary and the number of pixels for that label. Then you compute the averages for each label. Then you go over labels and pixels again, computing result. The O(M) time to find the averages is noise.
With recolor2 written in Python, recolor1 and recolor2 break even for N=1000000 and M=1000 at ~4s. As expected, recolor1's time grows linearly to ~20s for M=5000, while recolor2's remains essentially the same.
4s for a relatively small image is not great and it will get much worse for larger images. I'm no expert in numpy and associated libraries. Is there an O(N) solution there?

Let's try np.bincount and loop over the channels:
result = np.stack([np.bincount(labels.flat, weights=img[...,i].flat)[labels-1]
for i in range(3)],
axis=-1)
which takes about 35ms on my system with h,w,M = 1000,1000,1000.
Note This compute the sum, but mean should be easy enough.

Central difference with Convolution

So basically I am trying to do finite differencing on a 2d array without doing too many for loops. I would like to have the Hessian matrix of the array, and the gradient. So I need both the first order and second order derivative of the array.
This can be achieved by evaluating the following equation on on the array.
To deal with boundaries we only compute it for the interior points, so code for this derivate might look something like the following
arr = np.random.rand(16).reshape(4,4)
result = np.zeros_like(arr)
w, h = arr.shape
for i in range(1, w-1):
for j in range(1, h-1):
result[i,j] = (arr[i+1, j] - arr[i-1, j]) / (2*dx)
This gives the correct answer but can be very slow compared nu numpy operations, so I thought to myself. This is basically just a convolution with a kernel that looks like this
kernel = [1, 0 , -1]
So we execute the following code
from scipy.sigmal import convolve
result = np.pad((convolve(arr,kernel,mode='same',
method = 'direct')/(2*dx))[1:-1, 1:-1], 1).T
Since we are only dealing with the interior points, we cut them of and pad with zeros afterwards, to mimick what would happened in the previous naive case.
This works! But with some arrays, the mean squared error between the naive case and the convolution case sky rockets. So it seems that the numerical error increases very much for some cases.
I would like the speed gained by convolution with the stability of the naive case. Any help?

We can simply slice and operate. Hence, after output initialization, do -
result[1:-1,1:-1] = (arr[2:,1:-1] - arr[:-2,1:-1])/(2*dx)
Convolution IMHO would be an overkill when working with NumPy arrays, as slicing arrays are virtually free on memory and performance. Being compute heavy, one can look into numexpr though to leverage multi-cores.

Sliding Gabor Filter in python

Taken from the gabor filter example from skimage calculating a gabor filter for an image is easy:
import numpy as np
from scipy import ndimage as nd
from skimage import data
from skimage.util import img_as_float
from skimage.filter import gabor_kernel
brick = img_as_float(data.load('brick.png'))
kernel = np.real(gabor_kernel(0.15, theta = 0.5 * np.pi,sigma_x=5, sigma_y=5))
filtered = nd.convolve(brick, kernel, mode='reflect')
mean = filtered.mean()
variance = filtered.var()
brick is simply a numpy array. Suppose I have a 5000*5000 numpy array. What I want to achieve is to generate two new 5000*5000 numpy arrays where the pixels are the mean and var values of the gabor filter of the 15*15 window centered on them.
Could anyone help me achieve this?
EDIT
¿Why did I get downvoted? Anyway, to clarify I show an example on how to calculate a gabor filter on a single image. I would like to simply calculate a gabor filter on small square subsets of a very large image (hence the sliding window).

There no standard methods to do this (that I know of), but you can do it yourself directly.
Each pixel in the convolution is the sum of the values of the shift gabor filter times the image pixels. That is, each pixel in the convolution is basically the mean to within a constant normalization factor, so filtered is basically your mean.
The variance is a bit more difficult since that is the sum of the squares, and of course, you need to calculate the sqaures before you calculate the sums. But, you can do this easy enough by pre-squaring both the image and the kernel, that is:
N = kernel.shape[0]*kernel.shape[1]
mean = nd.convolve(brick, kernel, mode='reflect')/N
var = nd.convolve(brick*brick, kernel*kernel, mode='reflect')/N - mean*mean

If you just want to calculate the sliding average of an image (convolution with a square kernel with all 1's), the fast method is:
# fsize is the filter size in pixels
# integrate in the X direction
r_sum = numpy.sum(img[:, :fsize], axis=1)
r_diff = img[:, fsize:] - img[:, :-fsize]
r_int = numpy.cumsum(numpy.hstack((r_sum.reshape(-1,1), r_diff)), axis=1)
# integrate in the Y direction
c_sum = numpy.sum(r_img[:fsize, :], axis=0)
c_diff = r_img[fsize:, :] - r_img[:-fsize, :]
c_int = numpy.cumsum(numpy.vstack((c_sum, c_diff)), axis=0)
# now we have an array of sums, average can be obtained by division
avg_img = c_int / (f_size * f_size)
This method returns an image which is size-1 pixels smaller in both directions, so you'll have to take care of border effects yourself. The edge most pixels are bad anyway, but it is up to you to choose the correct border fill, if you need one. The algorithm is the fastest way to obtain the mean (fewest calculations), especially much faster than numpy.convolve.
Similar trickery can be used in calculating the variance, if both the image and its square are averaged as above. Then
npts = fsize * fsize
variance = (rolling_sum(img**2) - rolling_sum(img)/npts) / npts
where rolling_sum is a sliding sum (i.e. the algorithm above without the last division). So, only two rolling sums (image and its square) are required to calculate the rolling variance.
(Warning: the code above is untested, it is there just to illustrate the idea.)

Using strides for an efficient moving average filter

I recently learned about strides in the answer to this post, and was wondering how I could use them to compute a moving average filter more efficiently than what I proposed in this post (using convolution filters).
This is what I have so far. It takes a view of the original array then rolls it by the necessary amount and sums the kernel values to compute the average. I am aware that the edges are not handled correctly, but I can take care of that afterward... Is there a better and faster way? The objective is to filter large floating point arrays up to 5000x5000 x 16 layers in size, a task that scipy.ndimage.filters.convolve is fairly slow at.
Note that I am looking for 8-neighbour connectivity, that is a 3x3 filter takes the average of 9 pixels (8 around the focal pixel) and assigns that value to the pixel in the new image.
import numpy, scipy
filtsize = 3
a = numpy.arange(100).reshape((10,10))
b = numpy.lib.stride_tricks.as_strided(a, shape=(a.size,filtsize), strides=(a.itemsize, a.itemsize))
for i in range(0, filtsize-1):
if i > 0:
b += numpy.roll(b, -(pow(filtsize,2)+1)*i, 0)
filtered = (numpy.sum(b, 1) / pow(filtsize,2)).reshape((a.shape[0],a.shape[1]))
scipy.misc.imsave("average.jpg", filtered)
EDIT Clarification on how I see this working:
Current code:
use stride_tricks to generate an array like [[0,1,2],[1,2,3],[2,3,4]...] which corresponds to the top row of the filter kernel.
Roll along the vertical axis to get the middle row of the kernel [[10,11,12],[11,12,13],[13,14,15]...] and add it to the array I got in 1)
Repeat to get the bottom row of the kernel [[20,21,22],[21,22,23],[22,23,24]...]. At this point, I take the sum of each row and divide it by the number of elements in the filter, giving me the average for each pixel, (shifted by 1 row and 1 col, and with some oddities around edges, but I can take care of that later).
What I was hoping for is a better use of stride_tricks to get the 9 values or the sum of the kernel elements directly, for the entire array, or that someone can convince me of another more efficient method...

For what it's worth, here's how you'd do it using "fancy" striding tricks. I was going to post this yesterday, but got distracted by actual work! :)
#Paul & #eat both have nice implementations using various other ways of doing this. Just to continue things from the earlier question, I figured I'd post the N-dimensional equivalent.
You're not going to be able to significantly beat scipy.ndimage functions for >1D arrays, however. (scipy.ndimage.uniform_filter should beat scipy.ndimage.convolve, though)
Moreover, if you're trying to get a multidimensional moving window, you risk having memory usage blow up whenever you inadvertently make a copy of your array. While the initial "rolling" array is just a view into the memory of your original array, any intermediate steps that copy the array will make a copy that is orders of magnitude larger than your original array (i.e. Let's say that you're working with a 100x100 original array... The view into it (for a filter size of (3,3)) will be 98x98x3x3 but use the same memory as the original. However, any copies will use the amount of memory that a full 98x98x3x3 array would!!)
Basically, using crazy striding tricks is great for when you want to vectorize moving window operations on a single axis of an ndarray. It makes it really easy to calculate things like a moving standard deviation, etc with very little overhead. When you want to start doing this along multiple axes, it's possible, but you're usually better off with more specialized functions. (Such as scipy.ndimage, etc)
At any rate, here's how you do it:
import numpy as np
def rolling_window_lastaxis(a, window):
"""Directly taken from Erik Rigtorp's post to numpy-discussion.
<http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html>"""
if window < 1:
raise ValueError, "`window` must be at least 1."
if window > a.shape[-1]:
raise ValueError, "`window` is too long."
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def rolling_window(a, window):
if not hasattr(window, '__iter__'):
return rolling_window_lastaxis(a, window)
for i, win in enumerate(window):
if win > 1:
a = a.swapaxes(i, -1)
a = rolling_window_lastaxis(a, win)
a = a.swapaxes(-2, i)
return a
filtsize = (3, 3)
a = np.zeros((10,10), dtype=np.float)
a[5:7,5] = 1
b = rolling_window(a, filtsize)
blurred = b.mean(axis=-1).mean(axis=-1)
So what we get when we do b = rolling_window(a, filtsize) is an 8x8x3x3 array, that's actually a view into the same memory as the original 10x10 array. We could have just as easily used different filter size along different axes or operated only along selected axes of an N-dimensional array (i.e. filtsize = (0,3,0,3) on a 4-dimensional array would give us a 6 dimensional view).
We can then apply an arbitrary function to the last axis repeatedly to effectively calculate things in a moving window.
However, because we're storing temporary arrays that are much bigger than our original array on each step of mean (or std or whatever), this is not at all memory efficient! It's also not going to be terribly fast, either.
The equivalent for ndimage is just:
blurred = scipy.ndimage.uniform_filter(a, filtsize, output=a)
This will handle a variety of boundary conditions, do the "blurring" in-place without requiring a temporary copy of the array, and be very fast. Striding tricks are a good way to apply a function to a moving window along one axis, but they're not a good way to do it along multiple axes, usually....
Just my $0.02, at any rate...

I'm not familiar enough with Python to write out code for that, but the two best ways to speed up convolutions is to either separate the filter or to use the Fourier transform.
Separated filter : Convolution is O(M*N), where M and N are number of pixels in the image and the filter, respectively. Since average filtering with a 3-by-3 kernel is equivalent to filtering first with a 3-by-1 kernel and then a 1-by-3 kernel, you can get (3+3)/(3*3) = ~30% speed improvement by consecutive convolution with two 1-d kernels (this obviously gets better as the kernel gets larger). You may still be able to use stride tricks here, of course.
Fourier Transform : conv(A,B) is equivalent to ifft(fft(A)*fft(B)), i.e. a convolution in direct space becomes a multiplication in Fourier space, where A is your image and B is your filter. Since the (element-wise) multiplication of the Fourier transforms requires that A and B are the same size, B is an array of size(A) with your kernel at the very center of the image and zeros everywhere else. To place a 3-by-3 kernel at the center of an array, you may have to pad A to odd size. Depending on your implementation of the Fourier transform, this can be a lot faster than the convolution (and if you apply the same filter multiple times, you can pre-compute fft(B), saving another 30% of computation time).

Lets see:
It's not so clear form your question, but I'm assuming now that you'll like to improve significantly this kind of averaging.
import numpy as np
from numpy.lib import stride_tricks as st
def mf(A, k_shape= (3, 3)):
m= A.shape[0]- 2
n= A.shape[1]- 2
strides= A.strides+ A.strides
new_shape= (m, n, k_shape[0], k_shape[1])
A= st.as_strided(A, shape= new_shape, strides= strides)
return np.sum(np.sum(A, -1), -1)/ np.prod(k_shape)
if __name__ == '__main__':
A= np.arange(100).reshape((10, 10))
print mf(A)
Now, what kind of performance improvements you would actually expect?
Update:
First of all, a warning: the code in it's current state does not adapt properly to the 'kernel' shape. However that's not my primary concern right now (anyway the idea is there allready how to adapt properly).
I have just chosen the new shape of a 4D A intuitively, for me it really make sense to think about a 2D 'kernel' center to be centered to each grid position of original 2D A.
But that 4D shaping may not actually be the 'best' one. I think the real problem here is the performance of summing. One should to be able to find 'best order' (of the 4D A) inorder to fully utilize your machines cache architecture. However that order may not be the same for 'small' arrays which kind of 'co-operates' with your machines cache and those larger ones, which don't (at least not so straightforward manner).
Update 2:
Here is a slightly modified version of mf. Clearly it's better to reshape to a 3D array first and then instead of summing just do dot product (this has the advantage all so, that kernel can be arbitrary). However it's still some 3x slower (on my machine) than Pauls updated function.
def mf(A):
k_shape= (3, 3)
k= np.prod(k_shape)
m= A.shape[0]- 2
n= A.shape[1]- 2
strides= A.strides* 2
new_shape= (m, n)+ k_shape
A= st.as_strided(A, shape= new_shape, strides= strides)
w= np.ones(k)/ k
return np.dot(A.reshape((m, n, -1)), w)

One thing I am confident needs to be fixed is your view array b.
It has a few items from unallocated memory, so you'll get crashes.
Given your new description of your algorithm, the first thing that needs fixing is the fact that you are striding outside the allocation of a:
bshape = (a.size-filtsize+1, filtsize)
bstrides = (a.itemsize, a.itemsize)
b = numpy.lib.stride_tricks.as_strided(a, shape=bshape, strides=bstrides)
Update
Because I'm still not quite grasping the method and there seems to be simpler ways to solve the problem, I'm just going to put this here:
A = numpy.arange(100).reshape((10,10))
shifts = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
B = A[1:-1, 1:-1].copy()
for dx,dy in shifts:
xstop = -1+dx or None
ystop = -1+dy or None
B += A[1+dx:xstop, 1+dy:ystop]
B /= 9
...which just seems like the straightforward approach. The only extraneous operation is that it has allocate and populate B only once. All the addition, division and indexing has to be done regardless. If you are doing 16 bands, you still only need to allocate B once if your intent is to save an image. Even if this is no help, it might clarify why I don't understand the problem, or at least serve as a benchmark to time the speedups of other methods. This runs in 2.6 sec on my laptop on a 5k x 5k array of float64's, 0.5 of which is the creation of B

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Vectorized sliding / rolling numpy nanmean - python

Related

Finding nearest pixel in defined color space - quick implementation using numpy

Can this be done faster with numpy?

Central difference with Convolution

Sliding Gabor Filter in python

Using strides for an efficient moving average filter

Categories

Resources