Using an ND array to select on a dimension - python

How do I use an ndarray matrix to select elements of an ndarray?
Here's an example of what I mean.
a = np.arange(9)
b = np.arange(5)
c = np.arange(12)
A, B, C = np.meshgrid(a, b, c, indexing='ij')
Now, for each value of a, c, I want the b that minimizes A+C=B. Get the indices:
idx = np.abs(A+C-B).argmin(axis=1)
Clearly, idx has shape (9, 12) - it contains the index of b for each of the 9 a, and each of the 12 c.
Now, I would like to select the matrices with the "optimized b". That is, something along the lines of
B[:, idx, :]
that supposedly has shape (9, 1, 12) - because for each of the other combinations, it has only one value of b - the minimizing one. Now, B[:, idx, :] instead gives me the mesh of all potential combinations with shape (9, 9, 12, 12). I also tried
B[np.arange(B.shape[0]), idx, np.arange(B.shape[2])]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (9,) (9,12) (12,)
How do I get that specific type of matrix I described above?

You just need to add an axis there with np.newaxis/None to trigger advanced-indexing -
B[np.arange(B.shape[0])[:,None], idx, np.arange(B.shape[2])]
The idea basically is to map the rows of idx with the first indexing array of np.arange(B.shape[0]) and as such we need to add an axis there. For mapping the columns of idx, we already have np.arange(B.shape[2]) aligned along the columns of it.
Alternative to np.newaxis
Another way to add that new axis would be with reshaping Thus, we could replace B[np.arange(B.shape[0])[:,None] with np.arange(B.shape[0]).reshape(-1,1).
Further optimization
We could optimize the codes, by using open arrays to replace the huge arrays created by meshgrid, like so -
A0, B0, C0 = np.ix_(a,b,c)
idx = np.abs(A0+C0-B0).argmin(axis=1)
Thus, get the final output, like so -
B[np.arange(len(a))[:,None], idx, np.arange(len(c))]
Just to give ourselves the idea of memory saving here -
In [47]: A.nbytes + B.nbytes + C.nbytes
Out[47]: 12960
whereas A0, B0, C0 are views into the input arrays a, b, c respectively and as such don't occupy any additional memory, i.e. absolutely free -
In [49]: np.shares_memory(a,A0)
Out[49]: True
For completeness sake, a direct way to get idx would be -
np.abs(a[:,None,None]+c-b[:,None]).argmin(axis=1)

Related

How to select 3d array elements with another 3-d array of coordinates?

Let's say I have two array, one with all zero elements and the second one with indices that should be filled with 1. This can be done with the following code:
A = np.zeros((100, 50))
B = np.concatenate([np.random.randint(low=0, high=99, size=(10, 1)),
np.random.randint(low=0, high=49, size=(10, 1))],axis=1)
A[B[:, 0], B[:, 1]] = 1
However, this get trickier with adding another dimension. Now my A array is of shape (6, 100, 50) and my B array of coordinates is of shape (6, 10, 2):
A = np.zeros((6, 100, 50))
B = []
for i in range(6):
B0 = np.concatenate([np.random.randint(low=0, high=99, size=(10, 1)),
np.random.randint(low=0, high=49, size=(10, 1))],axis=1)
B.append(B0)
B = np.stack(B)
How can I then select elements of A with coordinates stored in B? First dimension is the same and contain the coordinates for respective matrices stored in first dimension of A.
Indexing specific values in a 3D array has to be done with 3 values, so the positions of subarrays in B is frankly kind of useless. There are 2 ways that I'm aware of to approach getting the index of the subarray position.
One way is to use a for loop to iterate through each subarray in B, but personally, I prefer explicitly creating a column in B with the first dimension index (though I recognize that this could eat up memory if B is very large). Here's how adding a column and indexing with that would work:
B_shape = B.shape
num_dimensions = 3
first_dimension = B_shape[0]
second_dimension = B_shape[1]
third_dimension = B_shape[2]
# Create a new column in B stating the index of the first dimension,
# rather than relying on the position as a stand-in. Flatten B so there are no subarrays,
# only rows of 3.
indices = np.repeat(np.arange(first_dimension)[:,None,None], second_dimension, axis=1)
B2 = np.concatenate((indices, B), axis=2).reshape(first_dimension*second_dimension, third_dimension + 1)
# Index A using the new B2 array.
A2 = A[tuple(B2.T)]
Let me know if you have any questions.

add column Numpy array python

I am very new to python and am very familiar with R, but my question is very simple using Numpy Arrays:
Observe:
I have one array X of dimension (100,2) of floating point type and I want to add a 3rd column, preferably into a new Numpy array of dimension (100,3) such that the 3rd column = col(1)^2 for every row in array of X.
My understanding is Numpy arrays are generally of fixed dimension so I'm OK with creating a new array of dim 100x3, I just don't know how to do so using Numpy arrays.
Thanks!
One way to do this is by creating a new array and then concatenating it. For instance, say that M is currently your array.
You can compute col(1)^2 as C = M[:,0] ** 2 (which I'm interpreting as column 1 squared, not column 1 to the power of the values in column two). C will now be an array with shape (100, ), so we can reshape it using C = np.expand_dims(C, 1) which will create a new axis of length 1, so our new column now has shape (100, 1). This is important because we want all both of our arrays to have the same number of dimensions when concatenating them.
The last step here is to concatenate them using np.concatenate. In total, our result looks like this
C = M[:, 0] ** 2
C = np.expand_dims(C, 1)
M = np.concatenate([M, C], axis=1) #third row will now be col(1) ^ 2
If you're the kind of person who likes to do things in one line, you have:
M = np.concatenate([M, np.expand_dims(M[:, 0] ** 2, 0)], axis=1)
That being said, I would recommend looking at Pandas, it supports these actions more naturally, in my opinion. In Pandas, it would be
M["your_col_3_name"] = M["your_col_1_name"] ** 2
where M is a pandas dataframe.
Append with axis=1 should work.
a = np.zeros((5,2))
b = np.ones((5,1))
print(np.append(a,b,axis=1))
This should return:
[[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1]]
# generate an array with shape (100,2), fill with 2.
a = np.full((100,2),2)
# calcuate the square to first column, this will be a 1-d array.
squared=a[:,0]**2
# concatenate the 1-d array to a,
# first need to convert it to 2-d arry with shape (100,1) by reshape(-1,1)
c = np.concatenate((a,squared.reshape(-1,1)),axis=1)

How to indexing multi-dimensional arrays given by indices in a certain axis?

Let's say I have a 4d array A with shape (D0, D1, D2, D3). I have a 1d array B with shape (D0,), which includes the indices I need at axis 2.
The trivial way to implement what I need:
output_lis = []
for i in range(D0):
output_lis.append(A[i, :, B[i], :])
#output = np.concatenate(output_lis, axis=0) #it is wrong to use concatenate. Thanks to #Mad Physicist. Instead, using stack.
output = np.stack(output_lis, axis=0) #shape: [D0, D1, D3]
So, my question is how to implement it with numpy API in a fast way?
Use fancy indexing to step along two dimensions in lockstep. In this case, arange provides the sequence i, while B provides the sequence B[i]:
A[np.arange(D0), :, B, :]
The shape of this array is indeed (D0, D1, D3), unlike the shape of your for loop result.
To get the same result from your example, use stack (which adds a new axis), rather than concatenate (which uses an existing axis):
output = np.stack(output_lis, axis=0)

How to use Numpy Matrix operation to calculate multiple samples at once?

How do I use Numpy matrix operations to calculate over multiple vector samples at once?
Please see below the code I came up with, 'd' is the outcome I'm trying to get. But this is only one sample. How do I calculate the output without doing something like repeat the code for every sample OR looping through every sample?
a = np.array([[1, 2, 3]])
b = np.array([[1, 2, 3]])
c = np.array([[1, 2, 3]])
d = ((a.T * b).flatten() * c.T)
a1 = np.array([[2, 3, 4]])
b1 = np.array([[2, 3, 4]])
c1 = np.array([[2, 3, 4]])
d1 = ((a1.T * b1).flatten() * c1.T)
a2 = np.array([[3, 4, 5]])
b2 = np.array([[3, 4, 5]])
c2 = np.array([[3, 4, 5]])
d2 = ((a2.T * b2).flatten() * c2.T)
The way broadcasting works is to repeat your data along an axis of size one as many times as necessary to make your element-wise operation work. That is what is happening to axis 1 of a.T and axis 0 of b. Similar for the product of the result. My recommendation would be to concatenate all your inputs along another dimension, to allow broadcasting to happen along the existing two.
Before showing how to do that, let me just mention that you would be much better off using ravel instead of flatten in your example. flatten makes a copy of the data, while ravel only makes a view. Since a.T * b is a temporary matrix anyway, there is really no reason to make the copy.
The easiest way to combine some arrays along a new dimension is np.stack. I would recommend combining along the first dimension for a couple of reasons. It's the default for stack and your result can be indexed more easily: d[0] will be d, d[1] will be d1, etc. If you ever add matrix multiplication into your pipeline, np.dot will work out of the box since it operates on the last two dimensions.
a = np.stack((a0, a1, a2, ..., aN))
b = np.stack((b0, b1, b2, ..., bN))
c = np.stack((c0, c1, c2, ..., cN))
Now a, b and c are all 3D arrays the first dimension is the measurement index. The second and third correspond to the two dimensions of the original arrays.
With this structure, what you called transpose before is just swapping the last two dimensions (since one of them is 1), and raveling/flattening is just multiplying out the last two dimensions, e.g. with reshape:
d = (a.reshape(N, -1, 1) * b).reshape(N, 1, -1) * c.reshape(N, -1, 1)
If you set one of the dimensions to have size -1 in the reshape, it will absorb the remaining size. In this case, all your arrays have 3 elements, so the -1 will be equivalent to 3.
You have to be a little careful when you convert the ravel operation to 3D. In 2D, x.ravel() * c.T implicitly transforms x into a 1xN array before broadcasting. In 3D, x.reshape(3, -1) creates a 2D 3x27 array, which you multiply by c.reshape(3, -1, 1), which is 3x3x1. Broadcasting rules state that you are effectively multiplying a 1x3x27 array by a 3x3x1, but you really want to multiply a 3x1x27 array by the 3x3x1, so you need to specify all three axes for the 3D "ravel" explicitly.
Here is an IDEOne link with your sample data for you to play with: https://ideone.com/p8vTlx

Elementwise multiplication of tensors of unknown dimension

How do I do an elementwise multiplication of tensors with the following shapes? The second array here is always assumed to be 2D.
[x, y, ?, ?, ?, ...] * [x, y]
I want to broadcast over all the dimensions marked ?, of which I don't know the number a-priori. Possible solutions I have considered (but don't know how to do):
Add a variable number of axes to the second array
Reverse the order of the axes of both arrays and then reverse them back again
Any pointers would be great.
The alternatives mentioned in the question (with b the 2D array):
Add a variable number of axes to the second array
a * b.reshape(b.shape + (1,)*(a.ndim-b.ndim))
Reverse the order of the axes of both arrays and then reverse them back again
(a.T * b.T).T
Another alternative with einsum:
numpy.einsum('ij...,ij->ij...', a, b)
Not pretty, but it works:
a = np.zeros((3, 4, 5, 6))
b = np.zeros((3, 4))
c = a*b[(slice(None), slice(None), )+(None, )*(a.ndim-2)]
Let's say the input arrays are A, B with B as the 2D array. To start off, reshape A to a 3D array with the trailing non-matching dimensions merged as one dimension, then perform the broadcasted elementwise multiplication with B and finally reshape back the product to original shape of A. The implementation would look like this -
shp = A.shape # Get shape of A
out = (A.reshape(shp[0],shp[1],-1)*B[:,:,None]).reshape(shp)
Verify output -
In [96]: A = np.random.rand(2,3,4,5,7,8,4)
In [97]: B = np.random.rand(2,3)
In [98]: shp = A.shape
...: out = (A.reshape(shp[0],shp[1],-1)*B[:,:,None]).reshape(shp)
...:
In [99]: direct_out = A*B[:,:,None,None,None,None,None]
In [100]: np.allclose(out,direct_out) # Verify
Out[100]: True

Categories

Resources