Numpy index slice without losing dimension information - python

I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!

Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.

It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)

I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...

Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)

To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]

This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]

I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)

Related

How to efficiently broadcast multiplication between arrays of shapes (n,m,k) and (n,m)

Let a be a numpy array of shape (n,m,k) and a_msk is an array of shape (n,m) containing that masks elements from a through multiplication.
Up to my knowledge, I had to create a new axis in a_msk in order to make it compatible with a for multiplication.
b = a * a_msk[:,:,np.newaxis]
Unfortunately, my Google Colab runtime is running out of memory at this very operation given the large size of the arrays.
My question is whether I can achieve the same thing without creating that new axis for the mask array.
As #hpaulj commented adding an axis to make the two arrays "compatible" for broadcasting is the most straightforward way to do your multiplication.
Alternatively, you can move the last axis of your array a to the front which would also make the two arrays compatible (I wonder though whether this would solve your memory issue):
a = np.moveaxis(a, -1, 0)
Then you can simply multiply:
b = a * a_msk
However, to get your result you have to move the axis back:
b = np.moveaxis(b, 0, -1)
Example: both solutions return the same answer:
import numpy as np
a = np.arange(24).reshape(2, 3, 4)
a_msk = np.arange(6).reshape(2, 3)
print(f'newaxis solution:\n {a * a_msk[..., np.newaxis]}')
print()
print(f'moveaxis solution:\n {np.moveaxis((np.moveaxis(a, -1, 0) * a_msk), 0, -1)}')

Extend 1d numpy array in multiple dimensions

I have a 1d numpy array, e.g. a=[10,12,15] and I want to extend it so that I end up with a numpy array b with the shape (3,10,15,20) filled with a so that e.g. b[:,1,1,1] is [10,12,15].
I thought of using np.repeat but it's not clear to me how to do ?
tile will do it for you. Internally this does a repeat for each axis.
In [114]: a = np.array([10,12,15])
In [115]: A = np.tile(a.reshape(3,1,1,1),(1,10,15,20))
In [116]: A.shape
Out[116]: (3, 10, 15, 20)
In [117]: A[:,1,1,1]
Out[117]: array([10, 12, 15])
For some purposes it might be enough to just do the reshape and let broadcasting expand the dimensions as needed (without actually expanding memory use).
Code:
import numpy as np
a = np.arange(1800).reshape((10,12,15))
b = np.repeat(a, repeats=5, axis=0).reshape(((3,10,15,20)))
You can change axis if you want to repeat in a different fashion. To understand repeat use lower shape for e.g. a(3,5,4) and b (2,3,5,4) and repeat on different axis.

Selecting last column of a Numpy array while maintaining then umber of dimensions? [duplicate]

I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)

Dealing with dimension collapse in python arrays

A recurring error I run into when using NumPy is that an attempt to index an array fails because one of the dimensions of the array was a singleton, and thus that dimension got wiped out and can't be indexed. This is especially problematic in functions designed to operate on arrays of arbitrary size. I'm looking for the cheapest, most universal way to avoid this error.
Here's an example:
import numpy as np
f = (lambda t, u, i=0: t[:,i]*u[::-1])
a = np.eye(3)
b = np.array([1,2,3])
f(a,b)
f(a[:,0],b[1])
The first call works as expected. The second call fails in two ways: 1) t can't be indexed by [:,0] because is has shape (3,), and 2) u can't be indexed at all because it's a scalar.
Here are the fixes that occur to me:
1) Use np.atleast_1d and np.atleast_2d etc. (possibly with conditionals to make sure that the dimensions are in the right order) inside f to make sure that all parameters have the dimensions they need. This precludes use of lambdas, and can take a few lines that I would rather not need.
2) Instead of writing f(a[:,0],b[1]) above, use f(a[:,[0]],b[[1]]). This is fine, but I always have to remember to put in the extra brackets, and if the index is stored in a variable you might not know if you should put the extra brackets in or not. E.g.:
idx = 1
f(a[:,[0]],b[[idx]])
idx = [2,0,1]
f(a[:,[0]],b[idx])
In this case, you would seem to have to call np.atleast_1d on idx first, which may be even more cumbersome than putting np.atleast_1d in the function.
3) In some cases I can get away with just not putting in an index. E.g.:
f = lambda t, u: t[0]*u
f(a,b)
f(a[:,0],b[0])
This works, and is apparently the slickest solution when it applies. But it doesn't help in every case (in particular, your dimensions have to be in the right order to begin with).
So, are there better approaches than the above?
There are lots of ways to avoid this behaviour.
First, whenever you index into a dimension of an np.ndarray with a slice rather than an integer, the number of dimensions of the output will be the same as that of the input:
import numpy as np
x = np.arange(12).reshape(3, 4)
print x[:, 0].shape # integer indexing
# (3,)
print x[:, 0:1].shape # slice
# (3, 1)
This is my preferred way of avoiding the problem, since it generalizes very easily from single-element to multi-element selections (e.g. x[:, i:i+1] vs x[:, i:i+n]).
As you've already touched on, you can also avoid dimension loss by using any sequence of integers to index into a dimension:
print x[:, [0]].shape # list
# (3, 1)
print x[:, (0,)].shape # tuple
# (3, 1)
print x[:, np.array((0,))].shape # array
# (3, 1)
If you choose to stick with integer indices, you can always insert a new singleton dimension using np.newaxis (or equivalently, None):
print x[:, 0][:, np.newaxis]
# (3, 1)
print x[:, 0][:, None]
# (3, 1)
Or else you could manually reshape it to the correct size (here using -1 to infer the size of the first dimension automatically):
print x[:, 0].reshape(-1, 1).shape
# (3, 1)
Finally, you can use an np.matrix rather than an np.ndarray. np.matrix behaves more like a MATLAB matrix, where singleton dimensions are left in whenever you index with an integer:
y = np.matrix(x)
print y[:, 0].shape
# (3, 1)
However, you should be aware that there are a number of other important differences between np.matrix and np.ndarray, for example the * operator performs elementwise multiplication on arrays, but matrix multiplication on matrices. In most circumstances it's best to stick to np.ndarrays.

python - repeating numpy array without replicating data

This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.
How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.
More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.
One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:
import numpy as np
M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)
# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False
To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:
M_strided = np.lib.stride_tricks.as_strided(
M, # input array
(700, M.shape[0], M.shape[1]), # output dimensions
(0, M.strides[0], M.strides[1]) # stride length in bytes
)

Categories

Resources