Fast Way to Perform Array Computation in Python - python

I have an image that I want to perform some calculations on. The image pixels will be represented as f(x, y) where x is the column number and y is the row number of each pixel. I want to perform a calculation using the following formula:
Here is the code that does the calculation:
import matplotlib.pyplot as plt
import numpy as np
import os.path
from PIL import Image
global image_width, image_height
# A. Blur Measurement
def measure_blur(f):
D_sub_h = [[0 for y in range(image_height)] for x in range(image_width)]
for x in range(image_width):
for y in range(image_height):
if(y == 0):
f_x_yp1 = f[x][y+1]
f_x_ym1 = 0
elif(y == (image_height -1)):
f_x_yp1 = 0
f_x_ym1 = f[x][y -1]
f_x_yp1 = f[x][y+1]
f_x_ym1 = f[x][y -1]
D_sub_h[x][y] = abs(f_x_yp1 - f_x_ym1)
return D_sub_h
if __name__ == '__main__':
image_counter = 1
while True:
if not os.path.isfile(str (image_counter) + '.jpg'):
image_path = str(image_counter) + '.jpg'
image = )
image_height, image_width = image.size
print("Image Width : " + str(image_width))
print("Image Height : " + str(image_height))
f = np.array(image)
D_sub_h = measure_blur(f)
image_counter = image_counter + 1
The problem with this code is when the image size becomes large, such as (5000, 5000), it takes a very long time to complete. Is there any way or function I can use to make the execution time faster by not doing one by one or manual computation?

Since you specifically convert the input f to a numpy array, I am assuming you want to use numpy. In that case, the allocation of D_sub_h needs to change from a list to an array:
D_sub_h = np.empty_like(f)
If we assume that everything outside your array is zeros, then the first row and last row can be computed as the second and negative second-to-last rows, respectively:
D_sub_h[0, :] = f[1, :]
D_sub_h[-1, :] = -f[-2, :]
The remainder of the data is just the difference between the next and previous index at each location, which is idiomatically computed by shifting views: f[2:, :] - f[:-2, :]. This formulation creates a temporary array. You can avoid doing that by using np.subtract explicitly:
np.subtract(f[2:, :], f[:-2, :], out=D_sub_h[1:-1, :])
The entire thing takes four lines in this formulation, and is fully vectorized, which means that loops run quickly under the hood, without most of Python's overhead:
def measure_blur(f):
D_sub_h = np.empty_like(f)
D_sub_h[0, :] = f[1, :]
D_sub_h[-1, :] = -f[-2, :]
np.subtract(f[2:, :], f[:-2, :], out=D_sub_h[1:-1, :])
return D_sub_h
Notice that I return the value instead of printing it. When you write functions, get in the habit of returning a value. Printing can be done later, and effectively discards the computation if it replaces a proper return.
The way shown above is fairly efficient with regards to time and space. If you want to write a one liner that uses a lot of temporary arrays, you can also do:
D_sub_h = np.concatenate((f[1, None], f[2:, :] - f[:-2, :], -f[-2, None]), axis=0)


Apply a kernel to selected pixels only within an image

I'm trying to build a customized DCGAN in PyTorch for a project. The custom part is my own gaussian blur filter on the end of the generator that blurs only pixel values over a certain threshold. The issue is when I add the filter to the forward pass of the generator, the backward() call takes much longer. Without the filter, it's pretty much instant; with the filter, it takes over a minute.
I'm assuming it's the big loop iterating through the image that's causing the problem, is there a way around this?
The forward pass of the generator:
def forward(self, x):
x = self.gen(x)
x = convolve2D(x)
return x
The convolve2D function takes a batch and iterates through it selectively applying the filter:
def convolve2D(batch, padding=0, strides=1):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
kernel = torch.tensor(([1, 2, 1], [2, 4, 2], [1, 2, 1])).to(device)
kernel_sum = kernel.sum()
# Gather Shapes of Kernel + Image + Padding
xKernShape = kernel.shape[0]
yKernShape = kernel.shape[1]
xImgShape = batch.shape[2]
yImgShape = batch.shape[3]
copy = batch.clone()
for i, image in tqdm(enumerate(copy)):
for j, channel in enumerate(image):
# Apply Equal Padding to All Sides
if padding != 0:
channelPadded = torch.zeros((channel.shape[0] + padding*2, channel.shape[1] + padding*2))
channelPadded[int(padding):int(-1 * padding), int(padding):int(-1 * padding)] = channel
channelPadded = channel
# Iterate through image
for y in range(yImgShape):
# Exit Convolution
if y > yImgShape - yKernShape:
# Only Convolve if y has gone down by the specified Strides
if y % strides == 0:
for x in range(xImgShape):
# Go to next row once kernel is out of bounds
if x > xImgShape - xKernShape:
# Ignore if pixel is an edge
if channel[x + 1, y + 1] < 0:
# Only Convolve if x has moved by the specified Strides
if x % strides == 0:
batch[i][j][x + 1, y + 1] = torch.mul(kernel, channelPadded[x: x + xKernShape, y: y + yKernShape]).sum() / kernel_sum
return batch
Yes, for backprop in pytorch a loop over the x and y coordinates of the image will be unbearably slow. Let original be batch[i][j] and blurred be the result when kernel is applied non-selectively over original in its entirety. For simplicity I'm omitting the code for this as it is trivial as long as the padding results in original and blurred having the same size.
Now, that you have convolved the kernel over the entire image, simply selectively choose elements from either blurred or original based on the value in original.
output = torch.where(original > threshold, blurred, original)

Speed up nested for loop with NumPy

I'm trying to write a package about image processing with some numpy operations. I've observe that the operations inside the nested loop are costly and want to speed it up.
Input is an 512 by 1024 image and be preprocessing into a
edge set, which is a list of (Ni,2) ndarrays for each array i.
And next, the nested for loop code will pass edge set and do some math stuffs.
###proprocessing: img ===> countour set
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
high_thresh, _ = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY +
lowThresh = 0.5*high_thresh
b = cv2.Canny(img, lowThresh, high_thresh)
edgeset, _ =
imgH = img.shape[0] ## 512
imgW = img.shape[1] ## 1024
num_edges = len(edgeset) ## ~900
min_length_segment_vp = imgH/6 ## ~100
### nested for loop
for i in range(num_edges):
if(edgeset[i].shape[0] > min_length_segment_vp):
#points: (N, 1, 2) ==> uv: (N, 2)
uv = edgeset[i].reshape(edgeset[i].shape[0],
uv = np.unique(uv, axis=0)
theta = -(uv[:, 1]-imgH/2)*np.pi/imgH
phi = (uv[:, 0]-imgW/2)*2*np.pi/imgW
xyz = np.zeros((uv.shape[0], 3))
xyz[:, 0] = np.sin(phi) * np.cos(theta)
xyz[:, 1] = np.cos(theta) * np.cos(phi)
xyz[:, 2] = np.sin(theta)
##xyz: (N, 3)
for _ in range(10):
if(xyz.shape[0] > N * 0.1):
bestInliers = np.array([])
bestOutliers = np.array([])
#### watch this out!
for _ in range(1000):
id0 = random.randint(0, xyz.shape[0]-1)
id1 = random.randint(0, xyz.shape[0]-1)
if(id0 == id1):
n = np.cross(xyz[id0, :], xyz[id1, :])
n = n / np.linalg.norm(n)
cosTetha = n # xyz.T
inliers = np.abs(cosTetha) < threshold
outliers = np.where(np.invert(inliers))[0]
inliers = np.where(inliers)[0]
if inliers.shape[0] > bestInliers.shape[0]:
bestInliers = inliers
bestOutliers = outliers
What I have tried:
I changed np.cross and np.norm into my custom cross and norm
only work for shape (3,) ndarray. This gives me a from ~0.9s into
~0.3s in my i5-4460 cpu.
I profile my code and find that now the code inside the most inner loop still cost 2/3 of time.
What I think I can try next:
Compile code into cython and add some cdef notation.
Translate whole file into C++.
Use some faster library for calculation like numexpr.
Vectorization of the loop process (but I don't know how).
Can I do more faster? Please give me some suggestions! Thanks!
The question is quite broad so I'll only give a few non-obvious tips based on my own experience.
If you use Cython, you might want to change the for loops into while loops. I've managed to get quite big (x5) speed-ups just from this, although it may not help for all possible cases;
Sometimes code that would be considered inefficient in regular Python, such as a nested while (or for) loop to apply a function to an array one element at a time, can be optimized by Cython to be faster than the equivalent vectorized Numpy approach;
Find out which Numpy functions cost the most time, and write your own in a way that Cython can most easily optimise them (see above point).

Is it possible to convert this numpy function to tensorflow?

I have a function that takes a [32, 32, 3] tensor, and outputs a [256,256,3] tensor.
Specifically, the function interprets the smaller array as if it was a .svg file, and 'renders' it to a 256x256 array as a canvas using this algorithm
For an explanation of WHY I would want to do this, see This question
The function behaves exactly as intended, until I try to include it in the training loop of a GAN. The current error I'm seeing is:
NotImplementedError: Cannot convert a symbolic Tensor (mul:0) to a numpy array.
A lot of other answers to similar errors seem to boil down to "You need to re-write the function using tensorflow, not numpy"
Here's the working code using numpy - is it possible to re-write it to exclusively use tensorflow functions?
def convert_to_bitmap(input_tensor, target, j):
#implied conversion to nparray - the tensorflow docs seem to indicate this is okay, but the error is thrown here when training
array = input_tensor
outputArray = target
output = target
for i in range(32):
col = float(array[i,0,j])
if ((float(array[i,0,0]))+(float(array[i,0,1]))+(float(array[i,0,2]))/3)< 0:
#slice only the red channel from the i line, multiply by 255
red_array = array[i,:,0]*255
#slice only the green channel, multiply by 255
green_array = array[i,:,1]*255
#combine and flatten them
combined_array = np.dstack((red_array, green_array)).flatten()
#remove the first two and last two indices of the combined array
index = [0,1,62,63]
clipped_array = np.delete(combined_array,index)
#filter array to remove values less than 0
filtered = clipped_array > 0
filtered_array = clipped_array[filtered]
#check array has an even number of values, delete the last index if it doesn't
if len(filtered_array) % 2 == 0:
filtered_array = np.delete(filtered_array,-1)
#convert into a set of tuples
l = filtered_array.tolist()
t = list(zip(l, l[1:] + l[:1]))
if not t:
output = fill_polygon(t, outputArray, col)
The 'fill polygon' function is copied from the 'mahotas' library:
def fill_polygon(polygon, canvas, color):
if not len(polygon):
min_y = min(int(y) for y,x in polygon)
max_y = max(int(y) for y,x in polygon)
polygon = [(float(y),float(x)) for y,x in polygon]
if max_y < canvas.shape[0]:
max_y += 1
for y in range(min_y, max_y):
nodes = []
j = -1
for i,p in enumerate(polygon):
pj = polygon[j]
if p[0] < y and pj[0] >= y or pj[0] < y and p[0] >= y:
dy = pj[0] - p[0]
if dy:
nodes.append( (p[1] + (y-p[0])/(pj[0]-p[0])*(pj[1]-p[1])) )
elif p[0] == y:
j = i
for n,nn in zip(nodes[::2],nodes[1::2]):
nn += 1
canvas[y, int(n):int(nn)] = color
NOTE: I'm not trying to get someone to convert the whole thing for me! There are some functions that are pretty obvious (tf.stack instead of np.dstack), but others that I don't even know how to start, like the last few lines of the fill_polygon function above.
Yes you can actually do this, you can use a python function in sth called tf.pyfunc. Its a python wrapper but its extremely slow in comparison to plain tensorflow. However, tensorflow and Cuda for example are so damn fast because they use stuff like vectorization, meaning you can rewrite a lot , really many of the loops in terms of mathematical tensor operations which are very fast.
In general:
If you want to use custom code as a custom layer, i would recommend you to rethink the algebra behind those loops and try to express them somehow different. If its just preprocessing before the training is going to start, you can use tensorflow but doing the same with numpy and other libraries is easier.
To your main question: Yes its possible, but better dont use loops. Tensorflow has a build-in loop optimizer but then you have to use tf.while() and thats anyoing (maybe just for me). I just blinked over your code, but it looks like you should be able to vectorize it quite good using the standard tensorflow vocabulary. If you want it fast, i mean really fast with GPU support write all in tensorflow, but nothing like 50/50 with tf.convert_to_tensor(), because than its going to be slow again. because than you switch between GPU and CPU and plain Python interpreter and the tensorflow low level API. Hope i could help you at least a bit
This code 'works', in that it only uses tensorflow functions, and does allow the model to train when used in a training loop:
def convert_image (x):
#split off the first column of the generator output, and store it for later (remove the 'colours' column)
colours_column = tf.slice(img_to_convert, tf.constant([0,0,0], dtype=tf.int32), tf.constant([32,1,3], dtype=tf.int32))
#split off the rest of the data, only keeping R + G, and discarding B
image_data_red = tf.slice(img_to_convert, tf.constant([0,1,0], dtype=tf.int32), tf.constant([32,31,1], dtype=tf.int32))
image_data_green = tf.slice(img_to_convert, tf.constant([0,1,1], dtype=tf.int32), tf.constant([32, 31,1], dtype=tf.int32))
#roll each row by 1 position, and make two more 2D tensors
rolled_red = tf.roll(image_data_red, shift=-1, axis=0)
rolled_green = tf.roll(image_data_green, shift=-1, axis=0)
#remove all values where either the red OR green channels are 0
zeroes = tf.constant(0, dtype=tf.float32)
#this is for the 'count_nonzero' command
boolean_red_data = tf.not_equal(image_data_red, zeroes)
boolean_green_data = tf.not_equal(image_data_green, zeroes)
initial_data_mask = tf.logical_and(boolean_red_data, boolean_green_data)
#count non-zero values per row and flatten it
count = tf.math.count_nonzero(initial_data_mask, 1)
count_flat = tf.reshape(count, [-1])
flat_red = tf.reshape(image_data_red, [-1])
flat_green = tf.reshape(image_data_green, [-1])
boolean_red = tf.math.logical_not(tf.equal(flat_red, tf.zeros_like(flat_red)))
boolean_green = tf.math.logical_not(tf.equal(flat_green, tf.zeros_like(flat_red)))
mask = tf.logical_and(boolean_red, boolean_green)
flat_red_without_zero = tf.boolean_mask(flat_red, mask)
flat_green_without_zero = tf.boolean_mask(flat_green, mask)
# create a ragged tensor
X0_ragged = tf.RaggedTensor.from_row_lengths(values=flat_red_without_zero, row_lengths=count_flat)
Y0_ragged = tf.RaggedTensor.from_row_lengths(values=flat_green_without_zero, row_lengths=count_flat)
#do the same for the rolled version
rolled_data_mask = tf.roll(initial_data_mask, shift=-1, axis=1)
flat_rolled_red = tf.reshape(rolled_red, [-1])
flat_rolled_green = tf.reshape(rolled_green, [-1])
#from SO "shift zeros to the end"
boolean_rolled_red = tf.math.logical_not(tf.equal(flat_rolled_red, tf.zeros_like(flat_rolled_red)))
boolean_rolled_green = tf.math.logical_not(tf.equal(flat_rolled_green, tf.zeros_like(flat_rolled_red)))
rolled_mask = tf.logical_and(boolean_rolled_red, boolean_rolled_green)
flat_rolled_red_without_zero = tf.boolean_mask(flat_rolled_red, rolled_mask)
flat_rolled_green_without_zero = tf.boolean_mask(flat_rolled_green, rolled_mask)
# create a ragged tensor
X1_ragged = tf.RaggedTensor.from_row_lengths(values=flat_rolled_red_without_zero, row_lengths=count_flat)
Y1_ragged = tf.RaggedTensor.from_row_lengths(values=flat_rolled_green_without_zero, row_lengths=count_flat)
#available outputs for future use are:
X0 = X0_ragged.to_tensor(default_value=0.)
Y0 = Y0_ragged.to_tensor(default_value=0.)
X1 = X1_ragged.to_tensor(default_value=0.)
Y1 = Y1_ragged.to_tensor(default_value=0.)
#Example tensor cel (replace with (x))
P = tf.cast(x, dtype=tf.float32)
#split out P.x and P.y, and fill a ragged tensor to the same shape as Rx
Px_value = tf.cast(x, dtype=tf.float32) - tf.cast((tf.math.floor(x/255)*255), dtype=tf.float32)
Py_value = tf.cast(tf.math.floor(x/255), dtype=tf.float32)
Px = tf.squeeze(tf.ones_like(X0)*Px_value)
Py = tf.squeeze(tf.ones_like(Y0)*Py_value)
#for each pair of values (Y0, Y1, make a vector, and check to see if it crosses the y-value (Py) either up or down
a = tf.math.less(Y0, Py)
b = tf.math.greater_equal(Y1, Py)
c = tf.logical_and(a, b)
d = tf.math.greater_equal(Y0, Py)
e = tf.math.less(Y1, Py)
f = tf.logical_and(d, e)
g = tf.logical_or(c, f)
#Makes boolean bitwise mask
#calculate the intersection of the line with the y-value, assuming it intersects
#P.x <= (G.x - R.x) * (P.y - R.y) / (G.y - R.y + R.x) - use tf.divide_no_nan for safe divide
h = tf.math.less(Px,(tf.math.divide_no_nan(((X1-X0)*(Py-Y0)),(Y1-Y0+X0))))
#combine using AND with the mask above
i = tf.logical_and(g,h)
#reshape to make a column tensor with the same dimensions as the colours
#divide by 2 using tf.floor_mod (returns remainder of division - any remainder means the value is odd, and hence the point is IN the polygon)
final_count = tf.cast((tf.math.count_nonzero(i, 1)), dtype=tf.int32)
twos = tf.ones_like(final_count, dtype=tf.int32)*tf.constant([2], dtype=tf.int32)
divide = tf.cast(tf.math.floormod(final_count, twos), dtype=tf.int32)
index = tf.cast(tf.range(0,32, delta=1), dtype=tf.int32)
clipped_index = divide*index
sort = tf.sort(clipped_index)
reverse = tf.reverse(sort, [-1])
value = tf.slice(reverse, [0], [1])
pair = tf.constant([0], dtype=tf.int32)
slice_tensor = tf.reshape(tf.stack([value, pair, pair], axis=0),[-1])
output_colour = tf.slice(colours_column, slice_tensor, [1,1,3])
return output_colour
This is where the 'convert image' function is applied using tf.vectorize_map:
def convert_images(image_to_convert):
global img_to_convert
img_to_convert = image_to_convert
process_list = tf.reshape((tf.range(0,65536, delta=1, dtype=tf.int32)), [65536, 1])
output_line = tf.vectorized_map(convert_image, process_list)
output_line_squeezed = tf.squeeze(output_line)
output_reshape = (tf.reshape(output_line_squeezed, [256,256,3])/127.5)-1
output = tf.expand_dims(output_reshape, axis=0)
return output
It is PAINFULLY slow, though - It does not appear to be using the GPU, and looks to be single threaded as well.
I'm adding it as an answer to my own question because is clearly IS possible to do this numpy function entirely in tensorflow - it just probably shouldn't be done like this.

Faster way to threshold a 4-D numpy array

I have a 4D numpy array of size (98,359,256,269) that I want to threshold.
Right now, I have two separate lists that keep the coordinates of the first 2 dimension and the last 2 dimensions. (mag_ang for the first 2 dimensions and indices for the last 2).
size of indices : (61821,2)
size of mag_ang : (35182,2)
Currently, my code looks like this:
inner_points = []
for k in indices:
x = k[0]
y = k[1]
for i,ctr in enumerate(mag_ang):
mag = ctr[0]
ang = ctr[1]
if X[mag][ang][x][y] > 10:
This code works but it's pretty slow and I wonder if there's any more pythonic/faster way to do this?s
(EDIT: added a second alternate method)
Use numpy multi-array indexing:
import time
import numpy as np
n_mag, n_ang, n_x, n_y = 10, 12, 5, 6
shape = n_mag, n_ang, n_x, n_y
X = np.random.random_sample(shape) * 20
nb_indices = 100 # 61821
indices = np.c_[np.random.randint(0, n_x, nb_indices), np.random.randint(0, n_y, nb_indices)]
nb_mag_ang = 50 # 35182
mag_ang = np.c_[np.random.randint(0, n_mag, nb_mag_ang), np.random.randint(0, n_ang, nb_mag_ang)]
# original method
inner_points = []
start = time.time()
for x, y in indices:
for mag, ang in mag_ang:
if X[mag][ang][x][y] > 10:
inner_points.append((y, x))
end = time.time()
print(end - start)
# faster method 1:
inner_points_faster1 = []
start = time.time()
for x, y in indices:
if np.any(X[mag_ang[:, 0], mag_ang[:, 1], x, y] > 10):
inner_points_faster1.append((y, x))
end = time.time()
print(end - start)
# faster method 2:
start = time.time()
# note: depending on the real size of mag_ang and indices, you may wish to do this the other way round ?
found = X[:, :, indices[:, 0], indices[:, 1]][mag_ang[:, 0], mag_ang[:, 1], :] > 10
# 'found' shape is (nb_mag_ang x nb_indices)
assert found.shape == (nb_mag_ang, nb_indices)
matching_indices_mask = found.any(axis=0)
inner_points_faster2 = indices[matching_indices_mask, :]
end = time.time()
print(end - start)
# finally assert equality of findings
inner_points = np.unique(np.array(inner_points))
inner_points_faster1 = np.unique(np.array(inner_points_faster1))
inner_points_faster2 = np.unique(inner_points_faster2)
assert np.array_equal(inner_points, inner_points_faster1)
assert np.array_equal(inner_points, inner_points_faster2)
(of course if you increase the shape the time will not be zero for the second and third)
Final note: here I use "unique" at the end, but it would maybe be wise to do it upfront for the indices and mag_ang arrays (except if you are sure that they are unique already)
Use numpy directly. If indices and mag_ang are numpy arrays of two columns each for the appropriate coordinate:
(x, y), (mag, ang) = indices.T, mag_ang.T
index_matrix = np.meshgrid(mag, ang, x, y).T.reshape(-1,4)
inner_mag, inner_ang, inner_x, inner_y = np.where(X[index_matrix] > 10)
Now you the inner... variables hold arrays for each coordinate. To get a single list of pars you can zip the inner_y and inner_x.
Here are few vecorized ways leveraging broadcasting -
thresh = 10
mask = X[mag_ang[:,0],mag_ang[:,1],indices[:,0,None],indices[:,1,None]]>thresh
r = np.where(mask)[0]
inner_points_out = indices[r][:,::-1]
For larger arrays, we can compare first and then index to get the mask -
mask = (X>thresh)[mag_ang[:,0],mag_ang[:,1],indices[:,0,None],indices[:,1,None]]
If you are only interested in the unique coordinates off indices, use the mask directly -
inner_points_out = indices[mask.any(1)][:,::-1]
For large arrays, we can also leverage multi-cores with numexpr module.
Thus, first off import the module -
import numexpr as ne
Then, replace (X>thresh) with ne.evaluate('X>thresh') in the computation(s) listed earlier.
Use np.where
inner = np.where(X > 10)
a, b, x, y = zip(*inner)
inner_points = np.vstack([y, x]).T

scipy.ndimage.interpolation.zoom uses nearest-neighbor-like algorithm for scaling-down

While testing scipy's zoom function, I found that the results of scailng-down an array are similar to the nearest-neighbour algorithm, rather than averaging. This increases noise drastically, and is generally suboptimal for many application.
Is there an alternative that does not use nearest-neighbor-like algorithm and will properly average the array when downsizing? While coarsegraining works for integer scaling factors, I would need non-integer scaling factors as well.
Test case: create a random 100*M x 100*M array, for M = 2..20
Downscale the array by the factor of M three ways:
1) by taking the mean in MxM blocks
2) by using scipy's zoom with a scaling factor 1/M
3) by taking a first point within a
Resulting arrays have the same mean, the same shape, but scipy's array has the variance as high as the nearest-neighbor. Taking a different order for scipy.zoom does not really help.
import scipy.ndimage.interpolation
import numpy as np
import matplotlib.pyplot as plt
mean1, mean2, var1, var2, var3 = [],[],[],[],[]
values = range(1,20) # down-scaling factors
for M in values:
N = 100 # size of an array
a = np.random.random((N*M,N*M)) # large array
b = np.reshape(a, (N, M, N, M))
b = np.mean(np.mean(b, axis=3), axis=1)
assert b.shape == (N,N) #coarsegrained array
c = scipy.ndimage.interpolation.zoom(a, 1./M, order=3, prefilter = True)
assert c.shape == b.shape
d = a[::M, ::M] # picking one random point within MxM block
assert b.shape == d.shape
plt.plot(values, mean1, label = "Mean coarsegraining")
plt.plot(values, mean2, label = "mean scipy.zoom")
plt.plot(values, var1, label = "Variance coarsegraining")
plt.plot(values, var2, label = "Variance zoom")
plt.plot(values, var3, label = "Variance Neareset neighbor")
EDIT: Performance of scipy.ndimage.zoom on a real noisy image is also very poor
The original image is here
The code that produced it:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage.interpolation import zoom
im ="/home/magus/Downloads/lena_noisy.png")
im = np.array(im)
plt.imshow(im, cmap="Greys_r")
im2 = zoom(im, 1 / 8.)
plt.title("Scipy zoom 8x")
plt.imshow(im2, cmap="Greys_r", interpolation="none")
im.shape = (64, 8, 64, 8)
im3 = np.mean(im, axis=3)
im3 = np.mean(im3, axis=1)
plt.imshow(im3, cmap="Greys_r", interpolation="none")
plt.title("averaging over 8x8 blocks")
Nobody posted a working answer, so I will post a solution I currently use. Not the most elegant, but works.
import numpy as np
import scipy.ndimage
def zoomArray(inArray, finalShape, sameSum=False,
zoomFunction=scipy.ndimage.zoom, **zoomKwargs):
Normally, one can use scipy.ndimage.zoom to do array/image rescaling.
However, scipy.ndimage.zoom does not coarsegrain images well. It basically
takes nearest neighbor, rather than averaging all the pixels, when
coarsegraining arrays. This increases noise. Photoshop doesn't do that, and
performs some smart interpolation-averaging instead.
If you were to coarsegrain an array by an integer factor, e.g. 100x100 ->
25x25, you just need to do block-averaging, that's easy, and it reduces
noise. But what if you want to coarsegrain 100x100 -> 30x30?
Then my friend you are in trouble. But this function will help you. This
function will blow up your 100x100 array to a 120x120 array using
scipy.ndimage zoom Then it will coarsegrain a 120x120 array by
block-averaging in 4x4 chunks.
It will do it independently for each dimension, so if you want a 100x100
array to become a 60x120 array, it will blow up the first and the second
dimension to 120, and then block-average only the first dimension.
inArray: n-dimensional numpy array (1D also works)
finalShape: resulting shape of an array
sameSum: bool, preserve a sum of the array, rather than values.
by default, values are preserved
zoomFunction: by default, scipy.ndimage.zoom. You can plug your own.
zoomKwargs: a dict of options to pass to zoomFunction.
inArray = np.asarray(inArray, dtype=np.double)
inShape = inArray.shape
assert len(inShape) == len(finalShape)
mults = [] # multipliers for the final coarsegraining
for i in range(len(inShape)):
if finalShape[i] < inShape[i]:
mults.append(int(np.ceil(inShape[i] / finalShape[i])))
# shape to which to blow up
tempShape = tuple([i * j for i, j in zip(finalShape, mults)])
# stupid zoom doesn't accept the final shape. Carefully crafting the
# multipliers to make sure that it will work.
zoomMultipliers = np.array(tempShape) / np.array(inShape) + 0.0000001
assert zoomMultipliers.min() >= 1
# applying scipy.ndimage.zoom
rescaled = zoomFunction(inArray, zoomMultipliers, **zoomKwargs)
for ind, mult in enumerate(mults):
if mult != 1:
sh = list(rescaled.shape)
assert sh[ind] % mult == 0
newshape = sh[:ind] + [sh[ind] // mult, mult] + sh[ind + 1:]
rescaled.shape = newshape
rescaled = np.mean(rescaled, axis=ind + 1)
assert rescaled.shape == finalShape
if sameSum:
extraSize = /
rescaled /= extraSize
return rescaled
myar = np.arange(16).reshape((4,4))
rescaled = zoomArray(myar, finalShape=(3, 5))
FWIW i found that order=1 at least preserves the mean a lot better than the default or order=3 (as expected really)

