Related
Given a Numpy array (actually a 3 channels image) I need to map a function on it, only where a triplette (aka RGB pixel) satisfies a predefined condition. All the rest should be kept untouched.
I know how to set a constant value when a pixel meets a certain condition, but I don't know how to apply a function having as parameter the value of such pixel.
For instance, the following example allows to set to 128 all the pixels that have all the channels greater than 128:
import numpy as np
L = 128
img = np.random.randint(0, 255, (5, 5, 3))
img[(img > L).all(axis=2)] = np.array([128, 128, 128])
But, what about if I have to set a value dependent on the current value of the pixel ?
The following code of course does not work:
import numpy as np
def smart_function(v):
return v//2
L = 128
img = np.random.randint(0, 255, (5, 5, 3))
img[(img > L).all(axis=2)] = smart_function(img)
I also tried with vectorize with no success:
import numpy as np
def smart_function(v):
return v//2
vf = np.vectorize(smart_function)
L = 128
img = np.random.randint(0, 255, (5, 5, 3))
img[(img > L).all(axis=2)] = vf(img)
Edit
To explain better my request, this is the expected behaviour written in plain Python. Obviously this code is very slow, so unusable, but it gives the idea:
for y in range(img.shape[0]):
for x in range(img.shape[1]):
pixel = img[y, x]
if pixel[0] > L and pixel[1] > L and pixel[2] > L:
img[y, x] = smart_function(pixel)
You could use frompyfunc with at:
import numpy as np
xs = np.random.randn(5, 5)
def f(x):
return np.round(x)
f = np.frompyfunc(f, nin=1, nout=1)
f.at(xs, xs > 0)
I'm interested in the version of Increment Numpy multi-d array with repeated indices indexed with a cross-product.
In particular, I want to perform the operation done by the following code using matrix operations to accelerate it:
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
My idea was to compute all the (i, j) indices at once and use NumPy's ix_ method to index the matrix s:
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
s[np.ix_(x_idx, y_idx)] += image
return s
It is easier to understand the code above with NumPy's example:
Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
In my case, it's likely that some indices will be repeated (as for example with grid_size=2, int(0 / grid_size) = int(1 / grid_size)). And that's where the Increment Numpy multi-d array with repeated indices question comes.
In case the indices are repeated, I would like to update the matrix with the image value by the same number of times. I cannot get any solution to this problem without any additional loops (e.g., zipping the indices; but you essentially have to perform the actual cross product of the indices for s and the image).
I don't think this is the best way to do it but here's one way.
import numpy as np
image = np.arange(9).reshape(3, 3)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2])
# find unique destinations
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
# create mask for the source and sumthe source pixels headed to the same destination
idxs_repeated = idxs[None, :, :].repeat(len(idxs_unique), axis = 0)
image_mask = (idxs_repeated == idxs_unique[:, None, :]).all(-1)
pixel_sum = (image.flatten()[None, :]*image_mask).sum(-1)
# assign summed sources to destination
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 1:
If you run into problems caused by memory constraints you can do the image masking and summation in batches as done in the following implementation. I set the batch size to 10 but that parameter can be set to whatever works on your machine.
import numpy as np
image = np.arange(12).reshape(3, 4)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2, 1])
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
batch_size = 10
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 2:
OP's method seems to be faster by far if you use numba.
import numpy as np
from numba import jit
#jit(nopython=True)
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
def get_s_vec(image, grid_size, batch_size = 10):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
y_idx, x_idx = np.meshgrid(y_idx, x_idx)
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
return s
print(f'loop result = {get_s(image, 2)}')
print(f'vector result = {get_s_vec(image, 2)}')
%timeit get_s(image, 2)
%timeit get_s_vec(image, 2)
output:
loop result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
vector result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
The slowest run took 15.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 751 ns per loop
1000 loops, best of 5: 195 µs per loop
Does skimage.measure.block_reduce do
what you want?
from skimage.measure import block_reduce
s = block_reduce(image, block_size=(grid_size, grid_size), func=np.sum)
I'm trying to extract small 3D patches (example patch size 20x20x4) from a 3D Image of size 250x250x250 with stride 1 for every axis. I'll be extracting all possible patches as I'll be running a function on each patch and returning the result in the form of a 3D image with the result of the current patch assigned to the center voxel of the patch. For extracting the patches I'll be using the code below :
import numpy as np
from numpy.lib import stride_tricks
def cutup(data, blck, strd):
sh = np.array(data.shape)
blck = np.asanyarray(blck)
strd = np.asanyarray(strd)
nbl = (sh - blck) // strd + 1
strides = np.r_[data.strides * strd, data.strides]
dims = np.r_[nbl, blck]
data6 = stride_tricks.as_strided(data, strides=strides, shape=dims)
return data6.reshape(-1, *blck)
#demo
x = np.zeros((250,250,250), int)
y = cutup(x, (20, 20, 4), (1, 1, 1))
I'm running this on google colab which has around 12gigs of ram. Since the result is large number of patches, I'm getting a large alloc error and then the kernel restarts. I think splitting the image in to parts would work, but If I do so how should I write the code in order for it to consider the neighbouring voxels? Is there a smart way to do this?
Don't reshape the newly strided array/view before returning.
def cutup(data, blck, strd):
sh = np.array(data.shape)
blck = np.asanyarray(blck)
strd = np.asanyarray(strd)
nbl = (sh - blck) // strd + 1
strides = np.r_[data.strides * strd, data.strides]
dims = np.r_[nbl, blck]
data6 = stride_tricks.as_strided(data, strides=strides, shape=dims)
return data6
Then iteratate over the patches.
p = np.zeros((250,250,250), int)
q = cutup(p, (20, 20, 4), (1, 1, 1))
print(f'windowed shape : {q.shape}')
print()
for i,x in enumerate(q):
print(f'x.shape:{x.shape}')
for j,y in enumerate(x):
print(f'\ty.shape:{y.shape}')
for k,z in enumerate(y):
print(f'\t\tz.shape:{z.shape}')
if k==5: break
break
break
>>>
windowed shape : (231, 231, 247, 20, 20, 4)
x.shape:(231, 247, 20, 20, 4)
y.shape:(247, 20, 20, 4)
z.shape:(20, 20, 4)
z.shape:(20, 20, 4)
z.shape:(20, 20, 4)
z.shape:(20, 20, 4)
z.shape:(20, 20, 4)
z.shape:(20, 20, 4)
Your example will produce an array (or a view of the array) with a shape of (231,231, 247, 20, 20, 4) or thirteen million+ 3-d patches.
That will solve your memory allocation problem.
when I try to reshape it to (231,231,247,-1). I get large alloc error
If your operation requires the last three dimensions to be flattened, do that in your iteration.
for i,x in enumerate(q):
for j,y in enumerate(x):
for k,z in enumerate(y):
z = z.reshape(-1)
print(f'\t\tz.shape:{z.shape}')
if k==5: break
break
break
Looks like you can do that reshape in the outermost loop - at least for a zeros array.
for i,x in enumerate(q):
zero,one,*last = x.shape
x = x.reshape(zero,one,-1)
print(f'x.shape:{x.shape}')
for j,y in enumerate(x):
print(f'\ty.shape:{y.shape}')
for k,z in enumerate(y):
print(f'\t\tz.shape:{z.shape}')
break
break
break
>>>
x.shape:(231, 247, 1600)
y.shape:(247, 1600)
z.shape:(1600,)
Is there a smart way to do this?
If you can figure out how to vectorize your operation so that you only need to iterate over the first dimension or the first and second dimensions you can speed up your processing. That should be a separate question if you encounter problems.
I want to generate new a○b vector with a and b (○ means element wise multiply). My code is below, but the performance looks bad because of for. Are there any efficient way?
a = torch.rand(batch_size, a_len, hid_dim)
b = torch.rand(batch_size, b_len, hid_dim)
# a_elmwise_mul_b = torch.zeros(batch_size, a_len, b_len, hid_dim)
for sample in range(batch_size):
for ai in range(a_len):
for bi in range(b_len):
a_elmwise_mul_b[sample, ai, bi] = torch.mul(a[sample, ai], b[sample, bi])
Update
I updated my code refer to Ahmad! Thank you.
N = 16
hid_dim = 50
a_seq_len = 10
b_seq_len = 20
a = torch.randn(N, a_seq_len, hid_dim)
b = torch.randn(N, b_seq_len, hid_dim)
shape = (N, a_seq_len, b_seq_len, hid_dim)
a_dash = a.unsqueeze(2) # (N, a_len, 1, hid_dim)
b_dash = b.unsqueeze(1) # (N, 1, b_len, hid_dim)
a_dash = a_dash.expand(shape)
b_dash = b_dash.expand(shape)
print(a_dash.size(), b_dash.size())
mul = a_dash * b_dash
print(mul.size())
----------
torch.Size([16, 10, 20, 50]) torch.Size([16, 10, 20, 50])
torch.Size([16, 10, 20, 50])
From your problem definition, it looks like you want to multiply two tensors, say A and B of shape AxE and BxE and want to get a tensor of shape AxBxE. It means you want to multiply, each row of tensor A with the whole tensor B. If it is correct, then we don't call it element-wise multiplication.
You can accomplish your goal as follows.
import torch
# batch_size = 16, a_len = 10, b_len = 20, hid_dim = 50
a = torch.rand(16, 10, 50)
b = torch.rand(16, 20, 50)
c = a.unsqueeze(2).expand(*a.size()[:-1], b.size(1), a.size()[-1])
d = b.unsqueeze(1).expand(b.size()[0], a.size(1), *b.size()[1:])
print(c.size(), d.size())
print(c.size(), d.size())
mul = c * d # shape of c, d: 16 x 10 x 20 x 50
print(mul.size()) # 16 x 10 x 20 x 50
Here, mul tensor is your desired result. Just to clarify, the above two lines realted to c and d computation, are equivalent to:
c = a.unsqueeze(2).expand(a.size(0), a.size(1), b.size(1), a.size(2))
d = b.unsqueeze(1).expand(b.size(0), a.size(1), b.size(1), b.size(2))
I have a 4d theano tensor (with the shape (1, 700, 16, 95000) for example) and a 4d 'mask' tensor with the shape (1, 700, 16, 1024) such that every element in the mask is an index that I need from the original tensor. How can I use my mask to index my tensor? Things like sample[mask] or sample[:, :, :, mask] don't really seem to work.
I also tried using a binary mask but since the tensor is rather large I get a 'device out of memory' exception.
Other ideas on how to get my indices from the tensor would also be very appreciated.
Thanks
So in the lack of an answer, I've decided to use the more computationally intensive solution which is unfolding both my data the the indices tensors, adding an offset to the indices to bring them to global positions, indexing the data and reshaping it back to original.
I'm adding here my test code, including a (commented-out) solution for matrices.
def theano_convertion(els, inds, offsets):
els = T.flatten(els)
inds = T.flatten(inds) + offsets
return T.reshape(els[inds], (2, 3, 16, 5))
if __name__ == '__main__':
# command: np.transpose(t[range(2), indices])
# t = np.random.randint(0, 10, (2, 20))
# indices = np.random.randint(0, 10, (5, 2))
t = np.random.randint(0, 10, (2, 3, 16, 20)).astype('int32')
indices = np.random.randint(0, 10, (2, 3, 16, 5)).astype('int32')
offsets = np.asarray(range(1, 2 * 3 * 16 + 1), dtype='int32')
offsets = (offsets * 20) - 20
offsets = np.repeat(offsets, 5)
offsets_tens = T.ivector('offsets')
inds_tens = T.itensor4('inds')
t_tens = T.itensor4('t')
func = theano.function(
[t_tens, inds_tens, offsets_tens],
[theano_convertion(t_tens, inds_tens, offsets_tens)]
)
shaped_elements = []
flattened_elements = []
[tmp] = func(t, indices, offsets)
for i in range(2):
for j in range(3):
for k in range(16):
shaped_elements.append(t[i, j, k, indices[i, j, k, :]])
flattened_elements.append(tmp[i, j, k, :])
print shaped_elements[-1] == flattened_elements[-1]