Is there an easy way to calculate a running variance filter on an image using Python/NumPy/Scipy? By running variance image I mean the result of calculating sum((I - mean(I))^2)/nPixels for each sub-window I in the image.
Since the images are quite large (12000x12000 pixels), I want to avoid the overhead of converting the arrays between formats just to be able to use a different library and then convert back.
I guess I could do this manually by finding the mean using something like
kernel = np.ones((winSize, winSize))/winSize**2
image_mean = scipy.ndimage.convolve(image, kernel)
diff = (image - image_mean)**2
# Calculate sum over winSize*winSize sub-images
# Subsample result
but it would be much nicer to have something like the stdfilt-function from Matlab.
Can anyone point me in the direction of a library that has this functionality AND supports numpy arrays, or hint at/provide a way to do this in NumPy/SciPy?
Simpler solution and also faster: use SciPy's ndimage.uniform_filter
import numpy as np
from scipy import ndimage
rows, cols = 500, 500
win_rows, win_cols = 5, 5
img = np.random.rand(rows, cols)
win_mean = ndimage.uniform_filter(img, (win_rows, win_cols))
win_sqr_mean = ndimage.uniform_filter(img**2, (win_rows, win_cols))
win_var = win_sqr_mean - win_mean**2
The "stride trick" is beautiful trick, but 4 slower and not that readable.
the generic_filter is 20 times slower than the strides...
You can use numpy.lib.stride_tricks.as_strided to get a windowed view of your image:
import numpy as np
from numpy.lib.stride_tricks import as_strided
rows, cols = 500, 500
win_rows, win_cols = 5, 5
img = np.random.rand(rows, cols)
win_img = as_strided(img, shape=(rows-win_rows+1, cols-win_cols+1,
win_rows, win_cols),
strides=img.strides*2)
And now win_img[i, j]is the (win_rows, win_cols) array with the top left corner at position [i, j]:
>>> img[100:105, 100:105]
array([[ 0.34150754, 0.17888323, 0.67222354, 0.9020784 , 0.48826682],
[ 0.68451774, 0.14887515, 0.44892615, 0.33352743, 0.22090103],
[ 0.41114758, 0.82608407, 0.77190533, 0.42830363, 0.57300759],
[ 0.68435626, 0.94874394, 0.55238567, 0.40367885, 0.42955156],
[ 0.59359203, 0.62237553, 0.58428725, 0.58608119, 0.29157555]])
>>> win_img[100,100]
array([[ 0.34150754, 0.17888323, 0.67222354, 0.9020784 , 0.48826682],
[ 0.68451774, 0.14887515, 0.44892615, 0.33352743, 0.22090103],
[ 0.41114758, 0.82608407, 0.77190533, 0.42830363, 0.57300759],
[ 0.68435626, 0.94874394, 0.55238567, 0.40367885, 0.42955156],
[ 0.59359203, 0.62237553, 0.58428725, 0.58608119, 0.29157555]])
You have to be careful, though, with not converting your windowed view of the image, into a windowed copy of it: in my example that would require 25 times more storage. I believe numpy 1.7 lets you select more than one axis, so you could then simply do:
>>> np.var(win_img, axis=(-1, -2))
I am stuck with numpy 1.6.2, so I cannot test that. The other option, which may fail with not-so-large windows, would be to do, if I remember my math correctly:
>>> win_mean = np.sum(np.sum(win_img, axis=-1), axis=-1)/win_rows/win_cols
>>> win_sqr_mean = np.sum(np.sum(win_img**2, axis=-1), axis=-1)/win_rows/win_cols
>>> win_var = win_sqr_mean - win_mean**2
And now win_var is an array of shape
>>> win_var.shape
(496, 496)
and win_var[i, j] holds the variance of the (5, 5) window with top left corner at [i, j].
After a bit of optimization we came up with this function for a generic 3D image:
def variance_filter( img, VAR_FILTER_SIZE ):
from numpy.lib.stride_tricks import as_strided
WIN_SIZE=(2*VAR_FILTER_SIZE)+1
if ~ VAR_FILTER_SIZE%2==1:
print 'Warning, VAR_FILTER_SIZE must be ODD Integer number '
# hack -- this could probably be an input to the function but Alessandro is lazy
WIN_DIMS = [ WIN_SIZE, WIN_SIZE, WIN_SIZE ]
# Check that there is a 3D image input.
if len( img.shape ) != 3:
print "\t variance_filter: Are you sure that you passed me a 3D image?"
return -1
else:
DIMS = img.shape
# Set up a windowed view on the data... this will have a border removed compared to the img_in
img_strided = as_strided(img, shape=(DIMS[0]-WIN_DIMS[0]+1, DIMS[1]-WIN_DIMS[1]+1, DIMS[2]-WIN_DIMS[2]+1, WIN_DIMS[0], WIN_DIMS[1], WIN_DIMS[2] ), strides=img.strides*2)
# Calculate variance, vectorially
win_mean = numpy.sum(numpy.sum(numpy.sum(img_strided, axis=-1), axis=-1), axis=-1) / (WIN_DIMS[0]*WIN_DIMS[1]*WIN_DIMS[2])
# As per http://en.wikipedia.org/wiki/Variance, we are removing the mean from every window,
# then squaring the result.
# Casting to 64 bit float inside, because the numbers (at least for our images) get pretty big
win_var = numpy.sum(numpy.sum(numpy.sum((( img_strided.T.astype('<f8') - win_mean.T.astype('<f8') )**2).T, axis=-1), axis=-1), axis=-1) / (WIN_DIMS[0]*WIN_DIMS[1]*WIN_DIMS[2])
# Prepare an output image of the right size, in order to replace the border removed with the windowed view call
out_img = numpy.zeros( DIMS, dtype='<f8' )
# copy borders out...
out_img[ WIN_DIMS[0]/2:DIMS[0]-WIN_DIMS[0]+1+WIN_DIMS[0]/2, WIN_DIMS[1]/2:DIMS[1]-WIN_DIMS[1]+1+WIN_DIMS[1]/2, WIN_DIMS[2]/2:DIMS[2]-WIN_DIMS[2]+1+WIN_DIMS[2]/2, ] = win_var
# output
return out_img.astype('>f4')
You can use scipy.ndimage.generic_filter. I can't test with matlab, but perhaps this gives you what you're looking for:
import numpy as np
import scipy.ndimage as ndimage
subs = 10 # this is the size of the (square) sub-windows
img = np.random.rand(500, 500)
img_std = ndimage.filters.generic_filter(img, np.std, size=subs)
You can make the sub-windows of arbitrary sizes using the footprint keyword. See this question for an example.
Related
I want to create salt and pepper noise function.
The input is noise_density, i.e. the amount of pixels as noise in the output image and it should return value is the noisy image data source
def salt_pepper(noise_density):
noisesource = ColumnDataSource(data={'image': [noiseImage]})
return noisesource
This function returns an image that is [density]x[density] pixels, using numpy to generate a random array and using PIL to generate the image itself from the array.
def salt_pepper(density):
imarray = numpy.random.rand(density,density,3) * 255
return Image.fromarray(imarray.astype('uint8')).convert('L')
Now, for example, you could run
salt_pepper(500)
To generate an image file that is 500x500px.
Of course, make sure to
import numpy
from PIL import Image
I came up with a vectorized solution which I'm sure can be improved/simplified. Although the interface is not exactly as the requested one, the code is pretty straightforward (and fast 😬) and I'm sure it can be easily adapted.
import numpy as np
from PIL import Image
def salt_and_pepper(image, prob=0.05):
# If the specified `prob` is negative or zero, we don't need to do anything.
if prob <= 0:
return image
arr = np.asarray(image)
original_dtype = arr.dtype
# Derive the number of intensity levels from the array datatype.
intensity_levels = 2 ** (arr[0, 0].nbytes * 8)
min_intensity = 0
max_intensity = intensity_levels - 1
# Generate an array with the same shape as the image's:
# Each entry will have:
# 1 with probability: 1 - prob
# 0 or np.nan (50% each) with probability: prob
random_image_arr = np.random.choice(
[min_intensity, 1, np.nan], p=[prob / 2, 1 - prob, prob / 2], size=arr.shape
)
# This results in an image array with the following properties:
# - With probability 1 - prob: the pixel KEEPS ITS VALUE (it was multiplied by 1)
# - With probability prob/2: the pixel has value zero (it was multiplied by 0)
# - With probability prob/2: the pixel has value np.nan (it was multiplied by np.nan)
# We need to to `arr.astype(np.float)` to make sure np.nan is a valid value.
salt_and_peppered_arr = arr.astype(np.float) * random_image_arr
# Since we want SALT instead of NaN, we replace it.
# We cast the array back to its original dtype so we can pass it to PIL.
salt_and_peppered_arr = np.nan_to_num(
salt_and_peppered_arr, nan=max_intensity
).astype(original_dtype)
return Image.fromarray(salt_and_peppered_arr)
You can load a black and white version of Lena like so:
lena = Image.open("lena.ppm")
bwlena = Image.fromarray(np.asarray(lena).mean(axis=2).astype(np.uint8))
Finally, you can save a couple of examples:
salt_and_pepper(bwlena, prob=0.1).save("sp01lena.png", "PNG")
salt_and_pepper(bwlena, prob=0.3).save("sp03lena.png", "PNG")
Results:
https://i.ibb.co/J2y9HXS/sp01lena.png
https://i.ibb.co/VTm5Vy2/sp03lena.png
I've had following codes that use Python and OpenCV. Briefly, I have a stack of image taken at different focal depth. The codes pick out pixels at every (x,y) position that has the largest Laplacian of Guassian response among all focal depth(z), thus creating a focus-stacked image. Function get_fmap creates a 2d array where each pixel will contains the number of the focal plane having the largest log response. In the following codes, lines that are commented out are my current VIPS implementation. They don't look compatible within the function definition because it's only partial solution.
# from gi.repository import Vips
def get_log_kernel(siz, std):
x = y = np.linspace(-siz, siz, 2*siz+1)
x, y = np.meshgrid(x, y)
arg = -(x**2 + y**2) / (2*std**2)
h = np.exp(arg)
h[h < sys.float_info.epsilon * h.max()] = 0
h = h/h.sum() if h.sum() != 0 else h
h1 = h*(x**2 + y**2 - 2*std**2) / (std**4)
return h1 - h1.mean()
def get_fmap(img): # img is a 3-d numpy array.
log_response = np.zeros_like(img[:, :, 0], dtype='single')
fmap = np.zeros_like(img[:, :, 0], dtype='uint8')
log_kernel = get_log_kernel(11, 2)
# kernel = get_log_kernel(11, 2)
# kernel = [list(row) for row in kernel]
# kernel = Vips.Image.new_from_array(kernel)
# img = Vips.new_from_file("testimg.tif")
for ii in range(img.shape[2]):
# img_filtered = img.conv(kernel)
img_filtered = cv2.filter2D(img[:, :, ii].astype('single'), -1, log_kernel)
index = img_filtered > log_response
log_response[index] = img_filtered[index]
fmap[index] = ii
return fmap
and then fmap will be used to pick out pixels from different focal planes to create a focus-stacked image
This is done on an extremely large image, and I feel VIPS might do a better job than OpenCV on this. However, the official documentation provides rather scant information on its Python binding. From the information I can find on the internet, I'm only able to make image convolution work ( which, in my case, is an order of magnitude faster than OpenCV.). I'm wondering how to implement this in VIPS, especially these lines?
log_response = np.zeros_like(img[:, :, 0], dtype = 'single')
index = img_filtered > log_response
log_response[index] = im_filtered[index]
fmap[index] = ii
log_response and fmap are initialized as 3D arrays in the question code, whereas the question text states that the output, fmap is a 2D array. So, I am assuming that log_response and fmap are to be initialized as 2D arrays with their shapes same as each image. Thus, the edits would be -
log_response = np.zeros_like(img[:,:,0], dtype='single')
fmap = np.zeros_like(img[:,:,0], dtype='uint8')
Now, back to the theme of the question, you are performing 2D filtering on each image one-by-one and getting the maximum index of filtered output across all stacked images. In case, you didn't know as per the documentation of cv2.filter2D, it could also be used on a multi-dimensional array giving us a multi-dimensional array as output. Then, getting the maximum index across all images is as simple as .argmax(2). Thus, the implementation must be extremely efficient and would be simply -
fmap = cv2.filter2D(img,-1,log_kernel).argmax(2)
After consulting the Python VIPS manual and some trial-and-error, I've come up with my own answer. My numpy and OpenCV implementation in question can be translated into VIPS like this:
import pyvips
img = []
for ii in range(num_z_levels):
img.append(pyvips.Image.new_from_file("testimg_z" + str(ii) + ".tif")
def get_fmap(img)
log_kernel = get_log_kernel(11,2) # get_log_kernel is my own function, which generates a 2-d numpy array.
log_kernel = [list(row) for row in log_kernel] # pyvips.Image.new_from_array takes 1-d list array.
log_kernel = pyvips.Image.new_from_array(log_kernel) # Turn the kernel into Vips array so it can be used by Vips.
log_response = img[0].conv(log_kernel)
for ii in range(len(img)):
img_filtered = img[ii+1].conv(log_kernel)
log_response = (img_filtered > log_response).ifthenelse(img_filtered, log_response)
fmap = (img_filtered > log_response).ifthenelse(ii+1, 0)
Logical indexing is achieved through ifthenelse method :
result_img = (test_condition).ifthenelse(value_if_true, value_if_false)
The syntax is rather flexible. The test condition can be a comparison between two images of the same size or between an image and a value, e.g. img1 > img2 or img > 5. Like wise, value_if_true can be a single value or a Vips image.
I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem.
I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:
Extract the max height above all images
Zoom the image so that each image reaches the max height keeping its ratio
Add a padding to the right with just white pixels until the image has a width/height ratio of 2
Keep in mind the images are represented as numpy matrices of grey scale values [0,255].
This is how I'm doing it now in Python:
max_height = numpy.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
for obs in data:
if len(obs[0])/len(obs) <= 2:
new_img = ndimage.zoom(obs, round(max_height/len(obs), 2), order=3)
missing_cols = max_height * 2 - len(new_img[0])
norm_img = []
for row in new_img:
norm_img.append(np.pad(row, (0, missing_cols), mode='constant', constant_values=255))
norm_img = np.resize(norm_img, (max_height, max_height*2))
There's a note about this code:
I'm rounding the zoom ratio because it makes the final height equal to max_height, I'm sure this is not the best approach but it's working (any suggestion is appreciated here). What I'd like to do is to expand the image keeping the ratio until it reaches a height equal to max_height. This is the only solution I found so far and it worked right away, the interpolation works pretty good.
So my final questions are:
Is there a better approach to achieve what explained above (image normalization) ? Do you think I could have done this differently ? Is there a common good practice I'm not following ?
Thanks in advance for your time.
Instead of ndimage.zoom you could use
scipy.misc.imresize. This
function allows you to specify the target size as a tuple, instead of by zoom
factor. Thus you won't have to call np.resize later to get the size exactly as
desired.
Note that scipy.misc.imresize calls
PIL.Image.resize
under the hood, so PIL (or Pillow) is a dependency.
Instead of using np.pad in a for-loop, you could allocate space for the desired array, norm_arr, first:
norm_arr = np.full((max_height, max_width), fill_value=255)
and then copy the resized image, new_arr into norm_arr:
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
For example,
from __future__ import division
import numpy as np
from scipy import misc
data = [np.linspace(255, 0, i*10).reshape(i,10)
for i in range(5, 100, 11)]
max_height = np.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
max_width = 2*max_height
result = []
for obs in data:
norm_arr = obs
h, w = obs.shape
if float(w)/h <= 2:
scale_factor = max_height/float(h)
target_size = (max_height, int(round(w*scale_factor)))
new_arr = misc.imresize(obs, target_size, interp='bicubic')
norm_arr = np.full((max_height, max_width), fill_value=255)
# check the shapes
# print(obs.shape, new_arr.shape, norm_arr.shape)
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
result.append(norm_arr)
# visually check the result
# misc.toimage(norm_arr).show()
The image (test.tif) is attached.
The np.nan values are the whitest region.
How to fill those whitest region using some gap filling algorithms that uses values from the neighbours?
import scipy.ndimage
data = ndimage.imread('test.tif')
As others have suggested, scipy.interpolate can be used. However, it requires fairly extensive index manipulation to get this to work.
Complete example:
from pylab import *
import numpy
import scipy.ndimage
import scipy.interpolate
import pdb
data = scipy.ndimage.imread('data.png')
# a boolean array of (width, height) which False where there are missing values and True where there are valid (non-missing) values
mask = ~( (data[:,:,0] == 255) & (data[:,:,1] == 255) & (data[:,:,2] == 255) )
# array of (number of points, 2) containing the x,y coordinates of the valid values only
xx, yy = numpy.meshgrid(numpy.arange(data.shape[1]), numpy.arange(data.shape[0]))
xym = numpy.vstack( (numpy.ravel(xx[mask]), numpy.ravel(yy[mask])) ).T
# the valid values in the first, second, third color channel, as 1D arrays (in the same order as their coordinates in xym)
data0 = numpy.ravel( data[:,:,0][mask] )
data1 = numpy.ravel( data[:,:,1][mask] )
data2 = numpy.ravel( data[:,:,2][mask] )
# three separate interpolators for the separate color channels
interp0 = scipy.interpolate.NearestNDInterpolator( xym, data0 )
interp1 = scipy.interpolate.NearestNDInterpolator( xym, data1 )
interp2 = scipy.interpolate.NearestNDInterpolator( xym, data2 )
# interpolate the whole image, one color channel at a time
result0 = interp0(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
result1 = interp1(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
result2 = interp2(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
# combine them into an output image
result = numpy.dstack( (result0, result1, result2) )
imshow(result)
show()
Output:
This passes to the interpolator all values we have, not just the ones next to the missing values (which may be somewhat inefficient). It also interpolates every point in the output, not just the missing values (which is extremely inefficient). A better way is to interpolate just the missing values, and then patch them into the original image. This is just a quick working example to get started :)
I think viena's question is more related to an inpainting problem.
Here are some ideas:
In order to fill the gaps in B/W images you can use some filling algorithm like scipy.ndimage.morphology.binary_fill_holes. But you have a gray level image, so you can't use it.
I suppose that you don't want to use a complex inpainting algorithm. My first suggestion is: Don't try to use Nearest gray value (you don't know the real value of the NaN pixels). Using the NEarest value will generate a dirty algorithm. Instead, I would suggest you to fill the gaps with some other value (e.g. the mean of the row). You can do it without coding by using scikit-learn:
Source:
>>> from sklearn.preprocessing import Imputer
>>> imp = Imputer(strategy="mean")
>>> a = np.random.random((5,5))
>>> a[(1,4,0,3),(2,4,2,0)] = np.nan
>>> a
array([[ 0.77473361, 0.62987193, nan, 0.11367791, 0.17633671],
[ 0.68555944, 0.54680378, nan, 0.64186838, 0.15563309],
[ 0.37784422, 0.59678177, 0.08103329, 0.60760487, 0.65288022],
[ nan, 0.54097945, 0.30680838, 0.82303869, 0.22784574],
[ 0.21223024, 0.06426663, 0.34254093, 0.22115931, nan]])
>>> a = imp.fit_transform(a)
>>> a
array([[ 0.77473361, 0.62987193, 0.24346087, 0.11367791, 0.17633671],
[ 0.68555944, 0.54680378, 0.24346087, 0.64186838, 0.15563309],
[ 0.37784422, 0.59678177, 0.08103329, 0.60760487, 0.65288022],
[ 0.51259188, 0.54097945, 0.30680838, 0.82303869, 0.22784574],
[ 0.21223024, 0.06426663, 0.34254093, 0.22115931, 0.30317394]])
The dirty solution that uses the Nearest values can be this:
1) Find the perimeter points of the NaN regions
2) Compute all the distances between the NaN points and the perimeter
3) Replace the NaNs with the nearest's point gray value
If you want values from the nearest neighbors, you could use the NearestNDInterpolator from scipy.interpolate. There are also other interpolators as well you can consider.
You can locate the X,Y index values for the NaN values with:
import numpy as np
nan_locs = np.where(np.isnan(data))
There are some other options for the interpolation as well. One option is to replace NaN values with the results of a median filter (but your areas are kind of large for this). Another option might be grayscale dilation. The correct interpolation depends on your end domain.
If you haven't used a SciPy ND interpolator before, you'll need to provide X, Y, and value data to fit the interpolator to then X and Y data for values to interpolate at. You can do this using the where example above as a template.
OpenCV has some image in-painting algorithms that you could use. You just need to provide a binary mask which indicates which pixels should be in-painted.
import cv2
import numpy as np
import scipy.ndimage
data = ndimage.imread("test.tif")
mask = np.isnan(data)
inpainted_img = cv2.inpaint(img, mask, inpaintRadius=3, flags=cv2.INPAINT_TELEA)
Variance image in gdal
I want a local variance image with a 3x3 of a geospatial raster image using python. My approach so far was to read in the raster band as an array, then using matrix notation to run a moving window and write the array into a new raster image. This approach worked well for a high pass filter as described in this tutorial: http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_slides6.pdf
Then I tried to calculate the variance with several approaches, the last one using numpy (as np), but I just get a gray image with the same value everywhere.
I am open to any kind of solution. If it gives me the average local variance in the end, that would be even better.
rows = srcDS.RasterYSize
#read in as array
data = srcBand.ReadAsArray(0,0, cols, rows).astype(np.int)
#calculate the variance for a 3x3 window
outVariance = np.zeros((rows, cols), np.float)
outVariance[1:rows-1,1:cols-1] = np.var([(data[0:rows-2,0:cols-2]),
(data[0:rows-2,1:cols-1]),
(data[0:rows-2,2:cols] ),
(data[1:rows-1,0:cols-2]),
(data[1:rows-1,1:cols-1]),
(data[1:rows-1,2:cols] ),
(data[2:rows,0:cols-2] ),
(data[2:rows,1:cols-1] ),
(data[2:rows,2:cols] )])
#output
outDS = driver.Create(outFN, cols, rows, 1, GDT_Float32)
outDS.SetGeoTransform(srcDS.GetGeoTransform())
outDS.SetProjection(srcDS.GetProjection())
outBand = outDS.GetRasterBand(1)
outBand.WriteArray(outVariance,0,0)
...
You could try Scipy, it has a function for running local filters on an array.
from scipy import ndimage
outVariance = ndimage.generic_filter(data, np.var, size=3)
It has a 'mode=' keyword for how the edges should be handled.
edit:
You can test it yourself, declare a 3x3 array:
a = np.random.rand(3,3)
a
[[ 0.01869967 0.14037373 0.32960675]
[ 0.17213158 0.35287243 0.13498175]
[ 0.29511881 0.46387688 0.89359801]]
For a 3x3 window, the variance of the center cell of the array will simply be:
print np.var(a)
0.058884734425985602
That value should be equal to the center cell of the returned array by Scipy:
print ndimage.generic_filter(a, np.var, size=3)
print ndimage.generic_filter(a, np.var, size=(3,3))
print ndimage.generic_filter(a, np.var, footprint=np.ones((3,3)))
[[ 0.01127325 0.01465338 0.00959321]
[ 0.02001052 0.05888473 0.07897385]
[ 0.00978547 0.06966683 0.09633447]]
Note that all other values in the array are 'edge-values' so the result depends on how Scipy handles the edges. It defaults to mode='reflect'.
See the documentation for more detailed information:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.filters.generic_filter.html
simpler solution and also faster : use uniform
and a "variance trick" explained here : http://imagej.net/Integral_Image_Filters (the variance is the difference between "sum of square" and "square of sum")
import numpy as np
from scipy import ndimage
rows, cols = 500, 500
win_rows, win_cols = 5, 5
img = np.random.rand(rows, cols)
win_mean = ndimage.uniform_filter(img,(win_rows,win_cols))
win_sqr_mean = ndimage.uniform_filter(img**2,(win_rows,win_cols))
win_var = win_sqr_mean - win_mean**2
the generic_filter is 40 times slower than the strides...