numpy array slicing, get one from each third dimension - python

I have a 3D array of data. I have a 2D array of indices, where the shape matches the first two dimensions of the data array, and it specfies the indices I want to pluck from the data array to make a 2D array. eg:
from numpy import *
a = arange(3 * 5 * 7).reshape((3,5,7))
getters = array([0,1,2] * (5)).reshape(3,5)
What I'm looking for is a syntax like a[:, :, getters] which returns an array of shape (3,5) by indexing independently into the third dimension of each item. However, a[:, :, getters] returns an array of shape (3,5,3,5). I can do it by iterating and building a new array, but this is pretty slow:
array([[col[getters[ri,ci]] for ci,col in enumerate(row)] for ri,row in enumerate(a)])
# gives array([[ 0, 8, 16, 21, 29],
# [ 37, 42, 50, 58, 63],
# [ 71, 79, 84, 92, 100]])
Is there a neat+fast way?

If I understand you correctly, I've done something like this using fancy indexing:
>>> k,j = np.meshgrid(np.arange(a.shape[1]),np.arange(a.shape[0]))
>>> k
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
>>> j
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
>>> a[j,k,getters]
array([[ 0, 8, 16, 21, 29],
[ 37, 42, 50, 58, 63],
[ 71, 79, 84, 92, 100]])
Of course, you can keep k and j around and use them as often as you'd like. As pointed out by DSM in comments below, j,k = np.indices(a.shape[:2]) should also work instead of meshgrid. Which one is faster (apparently) depends on the number of elements you are using.

Related

Tensor manipulation - creating a positional tensor from a given tensor

I have a input tensor which has zero padding at the start and then a sequence of values. So something like:
x = torch.tensor([[0, 2, 8, 12],
[0, 0, 6, 3]])
What I need is another tensor having same shape and retaining 0's for the padding and an increasing sequence for the rest of the numbers. So my output tensor should be:
y = ([[0, 1, 2, 3],
[0, 0, 1, 2]])
I tried something like:
MAX_SEQ=4
seq_start = np.nonzero(x)
start = seq_start[0][0]
pos_id = torch.cat((torch.from_numpy(np.zeros(start, dtype=int)).to(device), torch.arange(1, MAX_SEQ-start+1).to(device)), 0)
print(pos_id)
This works if the tensor is 1 dimensional but needs additional logic to handle it for 2-D shape. This can be done as np.nonzeros returns a tuple and we could probably loop thru' those tuples updating a counter or something. However I am sure there must be a simple tensor operation which should do this in 1-2 lines of code and also perhaps more effectively.
Help appreciated
A possible solution in three small steps:
Find the index of the first non zero element for each row. This can be done with a trick explained here (adapted here for non-binary tensors).
> idx = torch.arange(x.shape[1], 0, -1)
tensor([4, 3, 2, 1])
> xbin = torch.where(x == 0, 0, 1)
tensor([[0, 1, 1, 1],
[0, 0, 1, 1]])
> xbin*idx
tensor([[0, 3, 2, 1],
[0, 0, 2, 1]])
> indices = torch.argmax(xbin*idx, dim=1, keepdim=True)
tensor([[1],
[2]])
Create an arangement for the resulting tensor (without padding). This can be done by applying torch.repeat and torch.view on a torch.arange call:
> rows, cols = x.shape
> seq = torch.arange(1, cols+1).repeat(1, rows).view(-1, cols)
tensor([[1, 2, 3, 4],
[1, 2, 3, 4]])
Lastly - here's the trick! - we substract the index of the first non-zero element with the arangement, for each row. Then we mask the padding values and replace them with zeros:
> pos_id = seq - indices
tensor([[ 0, 1, 2, 3],
[-1, 0, 1, 2]])
> mask = indices > seq - 1
tensor([[ True, False, False, False],
[ True, True, False, False]])
> pos_id[mask] = 0
tensor([[0, 1, 2, 3],
[0, 0, 1, 2]])
Expanding Ivan's nice answer to include batch size as my model had that. This 'seems' to work. This is just for a reference in case more than 2D to be considered
x = torch.tensor([[[ 0, 0, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]],
[[0, 0, 0, 0, 0, 0, 26, 27, 28, 29],
[0, 31, 32, 33, 34, 35, 36, 37, 38, 39]],
[[0, 0, 42, 43, 44, 45, 46, 47, 48, 49],
[0, 0, 0, 53, 0, 55, 56, 57, 58, 59]]])
bs, rows, cols = x.shape
seq = torch.arange(1, cols+1).repeat(1, rows).repeat(1, bs).view(bs, rows, cols)
idx = torch.arange(x.shape[-1], 0, -1)
xbin = torch.where(x == 0, 0, 1)
indices = torch.argmax(xbin*idx, dim=2, keepdim=True)
pos_id = seq - indices
mask = indices > seq - 1
pos_id[mask] = 0
print(pos_id)

Python, Indexing and assigning to Np Array

To improve the speed I would like to avoid forloops.
I have a image array looking like :
image = np.zeros_like(np.zeros(shape=(480,640,1)),dtype=np.uint8)
and a typed np array Events with the following types
dtype = [('x', '<f8'),('y', '<f8'),('grayVal','<u2')
where 'x' = row and 'y' = column of the image array.
The Question is:
How can I assign the grayVal in Events to all the x and y in the image ?
So far I tried (and more not displayable):
The For Loop:
for event in Events:
image[event['y'],event['x']] = event['grayVal']
and Indexing
events['y'].shape
(98210,)
events['x'].shape
(98210,)
events['grayVal'].shape
(98210,)
image[np.ix_(events['y'],events['x'])] = events['grayVal']
which somehow does not work due to the error message:
ValueError: shape mismatch: value array of shape (98210,) could not be broadcast to indexing result of shape (98210,98210,1)
What am I missing? Thanks for the help.
Let's work with a small example, one we can actually examine and play with!
Make a structured array:
In [32]: dt = np.dtype([('x', int),('y', int) ,('grayVal','u2')])
In [33]: events = np.zeros(5, dt)
In [34]: events['x'] = np.arange(5)
In [35]: events['y'] = np.array([3,4,0,2,1])
In [36]: events['grayVal'] = np.arange(1,6)
To examine indexing lets make a nice 2d array:
In [38]: image = np.arange(25).reshape(5,5)
In [39]: image
Out[39]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Look at what ix_ produces - 2 arrays that can broadcast against each other. A (5,1) and (1,5), which broadcast to (5,5):
In [40]: np.ix_(events['y'], events['x'])
Out[40]:
(array([[3],
[4],
[0],
[2],
[1]]),
array([[0, 1, 2, 3, 4]]))
Using those arrays to index image just shuffles values - the result is still a 2d array:
In [41]: image[np.ix_(events['y'], events['x'])]
Out[41]:
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[ 5, 6, 7, 8, 9]])
If instead we index with the arrays, not with the ix_ arrays:
In [42]: image[events['y'], events['x']]
Out[42]: array([15, 21, 2, 13, 9])
This is just the diagonal of the array produced with ix_. Indexing with a (n,) and (n,) arrays produces a (n,) array of values (as opposed to the ix_ (n,n) array).
So starting with a zeros image, we can assign values with:
In [43]: image= np.zeros((5,5), 'u2')
In [44]: image[events['y'], events['x']]=events['grayVal']
In [45]: image
Out[45]:
array([[0, 0, 3, 0, 0],
[0, 0, 0, 0, 5],
[0, 0, 0, 4, 0],
[1, 0, 0, 0, 0],
[0, 2, 0, 0, 0]], dtype=uint16)
I can only think of a slow version, with a for loop for now. But that could be OK if the array is sparse. Maybe someone else can vectorize that.
import numpy as np
image = np.zeros(shape=(3,4 ),dtype=np.uint8) # image is empty
# evy is just a bag of nonzero pixels
evy=np.zeros(shape=(3), dtype = [('x', '<u2'),('y', '<u2') ,('grayVal','<u2') ])
evy[0]=(1,1,128)
evy[1]=(0,0,1)
evy[2] =(2,3,255)
#slow version
for i in range(3):
image[evy[i][0],evy[i][1]]=evy[i][2]
output:
array([[ 1, 0, 0, 0],
[ 0, 128, 0, 0],
[ 0, 0, 0, 255]], dtype=uint8)
​

numpy map 2D array values

I'm trying to map values of 2D numpy array, i.e. to iterate (efficiently) over rows and append values based on row index.
One of approaches I have tried is:
source = misc.imread(fname) # Load some image
img = np.array(source, dtype=np.float64) / 255 # Cast and normalize values
w, h, d = tuple(img.shape) # Get dimensions
img = np.reshape(img, (w * h, d)) # Flatten 3D to 2D
# The actual problem:
# Map (R, G, B) pixels to (R, G, B, X, Y) to preserve position
img_data = ((px[0], px[1], px[2], idx % w, int(idx // w)) for idx, px in enumerate(img))
img_data = np.fromiter(img_data, dtype=tuple) # Get back to np.array
but the solution raises: ValueError: cannot create object arrays from iterator
Can anyone suggest how to perform efficiently this absurdly simple operation in numpy? It's out of my mind how intricate is this library... And why that code consumes a few gigs of memory for 7000x5000 px?
Thanks
maybe np.concatenate and np.indices:
np.concatenate((np.arange(40).reshape((4,5,2)), *np.indices((4,5,1))), axis=-1)[:,:,:-1]
Out[264]:
array([[[ 0, 1, 0, 0],
[ 2, 3, 0, 1],
[ 4, 5, 0, 2],
[ 6, 7, 0, 3],
[ 8, 9, 0, 4]],
[[10, 11, 1, 0],
[12, 13, 1, 1],
[14, 15, 1, 2],
[16, 17, 1, 3],
[18, 19, 1, 4]],
[[20, 21, 2, 0],
[22, 23, 2, 1],
[24, 25, 2, 2],
[26, 27, 2, 3],
[28, 29, 2, 4]],
[[30, 31, 3, 0],
[32, 33, 3, 1],
[34, 35, 3, 2],
[36, 37, 3, 3],
[38, 39, 3, 4]]])
the [:,:,:-1] strips an 'extra' 0 entry, maybe there's a better way

Use 2D matrix as indexes for a 3D matrix in numpy?

Say I have an array of shape 2x3x3, which is a 3D matrix. I also have a 2D matrix of shape 3x3 that I would like to use as indices for the 3D matrix along the first axis. Example is below.
Example run:
>>> np.random.randint(0,2,(3,3)) # index
array([[0, 1, 0],
[1, 0, 1],
[1, 0, 0]])
>> np.random.randint(0,9,(2,3,3)) # 3D matrix
array([[[4, 4, 5],
[2, 6, 7],
[2, 6, 2]],
[[4, 0, 0],
[2, 7, 4],
[4, 4, 0]]])
>>> np.array([[4,0,5],[2,6,4],[4,6,2]]) # result
array([[4, 0, 5],
[2, 6, 4],
[4, 6, 2]])
It seems you are using 2D array as index array and 3D array to select values. Thus, you could use NumPy's advanced-indexing -
# a : 2D array of indices, b : 3D array from where values are to be picked up
m,n = a.shape
I,J = np.ogrid[:m,:n]
out = b[a, I, J] # or b[a, np.arange(m)[:,None],np.arange(n)]
If you meant to use a to index into the last axis instead, just move a there : b[I, J, a].
Sample run -
>>> np.random.seed(1234)
>>> a = np.random.randint(0,2,(3,3))
>>> b = np.random.randint(11,99,(2,3,3))
>>> a # Index array
array([[1, 1, 0],
[1, 0, 0],
[0, 1, 1]])
>>> b # values array
array([[[60, 34, 37],
[41, 54, 41],
[37, 69, 80]],
[[91, 84, 58],
[61, 87, 48],
[45, 49, 78]]])
>>> m,n = a.shape
>>> I,J = np.ogrid[:m,:n]
>>> out = b[a, I, J]
>>> out
array([[91, 84, 37],
[61, 54, 41],
[37, 49, 78]])
If your matrices get much bigger than 3x3, to the point that memory involved in np.ogrid is an issue, and if your indexes remain binary, you could also do:
np.where(a, b[1], b[0])
But other than that corner case (or if you like code golfing one-liners) the other answer is probably better.
There is a numpy function off-the-shelf: np.choose.
It also comes with some handy broadcast options.
import numpy as np
cube = np.arange(18).reshape((2,3,3))
sel = np.array([[1, 0, 1], [0, 1, 1], [0,1,0]])
the_selection = np.choose(sel, cube)
>>>the_selection
array([[ 9, 1, 11],
[ 3, 13, 14],
[ 6, 16, 8]])
This method works with any 3D array.

Numpy: Broadcasting from submatrix

Given two 2D arrays:
A =[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]]
B =[[1, 2],
[3, 4]]
A - B = [[ 0, -1, 1, 0],
[-2, -3, -1, -2],
[ 2, 1, 3, 2],
[ 0, -1, 1, 0]]
B's shape is 2,2, A's is 4,4. I want to perform a broadcast subtraction of B over A: A - B.
I specifically want to use broadcasting as the array sizes I am dealing with are very large (8456,8456). I am hoping that broadcasting will provide a small performance optimization.
I've tried reshaping the arrays but with no luck, and am stumped. Scikit is not available to me to use.
You can expand B by tiling it twice in both dimensions:
print A - numpy.tile(B, (2, 2))
yields
[[ 0 -1 1 0]
[-2 -3 -1 -2]
[ 2 1 3 2]
[ 0 -1 1 0]]
However for big matrices this may create a lot of overhead in RAM.
Alternatively you can view A in blocks using Scikit Image's skimage.util.view_as_blocks and modify it in place
Atmp = skimage.util.view_as_blocks(A, block_shape=(2, 2))
Atmp -= B
print A
which will result, without needlessly repeating B
[[ 0 -1 1 0]
[-2 -3 -1 -2]
[ 2 1 3 2]
[ 0 -1 1 0]]
Approach #1 : Here's an approach using strides that uses the concept of views without making actual copies to then perform subtraction from A and as such should be quite efficient -
m,n = B.strides
m1,n1 = A.shape
m2,n2 = B.shape
s1,s2 = m1//m2, n1//n2
strided = np.lib.stride_tricks.as_strided
out = A - strided(B,shape=(s1,m2,s2,n2),strides=(0,n2*n,0,n)).reshape(A.shape)
Sample run -
In [78]: A
Out[78]:
array([[29, 53, 30, 25, 92, 10],
[ 2, 20, 35, 87, 0, 9],
[46, 30, 20, 62, 79, 63],
[44, 9, 78, 33, 6, 40]])
In [79]: B
Out[79]:
array([[35, 60],
[21, 86]])
In [80]: m,n = B.strides
...: m1,n1 = A.shape
...: m2,n2 = B.shape
...: s1,s2 = m1//m2, n1//n2
...: strided = np.lib.stride_tricks.as_strided
...:
In [81]: # Replicated view
...: strided(B,shape=(s1,m2,s2,n2),strides=(0,n2*n,0,n)).reshape(A.shape)
Out[81]:
array([[35, 60, 35, 60, 35, 60],
[21, 86, 21, 86, 21, 86],
[35, 60, 35, 60, 35, 60],
[21, 86, 21, 86, 21, 86]])
In [82]: A - strided(B,shape=(s1,m2,s2,n2),strides=(0,n2*n,0,n)).reshape(A.shape)
Out[82]:
array([[ -6, -7, -5, -35, 57, -50],
[-19, -66, 14, 1, -21, -77],
[ 11, -30, -15, 2, 44, 3],
[ 23, -77, 57, -53, -15, -46]])
Approach #2 : We can just reshape both A and B to 4D shapes with B having two singleton dimensions along which its elements would be broadcasted when subtracted from 4D version of A. After subtraction, we reshape back to 2D for final output. Thus, we would have an implementation, like so -
m1,n1 = A.shape
m2,n2 = B.shape
out = (A.reshape(m1//m2,m2,n1//n2,n2) - B.reshape(1,m2,1,n2)).reshape(m1,n1)
This should work if A has dimentions that are multiple of B's dimentions:
A - np.tile(B, (int(A.shape[0]/B.shape[0]), int(A.shape[1]/B.shape[1])))
And the result:
array([[ 0, -1, 1, 0],
[-2, -3, -1, -2],
[ 2, 1, 3, 2],
[ 0, -1, 1, 0]])
If you do not want to tile, you can reshape A to extract (2, 2) blocks, and use broadcasting to substract B:
C = A.reshape(A.shape[0]//2, 2, A.shape[1]//2, 2).swapaxes(1, 2)
C - B
array([[[[ 0, -1],
[-2, -3]],
[[ 1, 0],
[-1, -2]]],
[[[ 2, 1],
[ 0, -1]],
[[ 3, 2],
[ 1, 0]]]])
And then swap the axis back and reshape:
(C - B).swapaxes(1, 2).reshape(A.shape[0], A.shape[1])
This should be significantly faster, since C is a view on A, not a constructed array.

Categories

Resources