I have a 1d numpy array, e.g. a=[10,12,15] and I want to extend it so that I end up with a numpy array b with the shape (3,10,15,20) filled with a so that e.g. b[:,1,1,1] is [10,12,15].
I thought of using np.repeat but it's not clear to me how to do ?
tile will do it for you. Internally this does a repeat for each axis.
In [114]: a = np.array([10,12,15])
In [115]: A = np.tile(a.reshape(3,1,1,1),(1,10,15,20))
In [116]: A.shape
Out[116]: (3, 10, 15, 20)
In [117]: A[:,1,1,1]
Out[117]: array([10, 12, 15])
For some purposes it might be enough to just do the reshape and let broadcasting expand the dimensions as needed (without actually expanding memory use).
Code:
import numpy as np
a = np.arange(1800).reshape((10,12,15))
b = np.repeat(a, repeats=5, axis=0).reshape(((3,10,15,20)))
You can change axis if you want to repeat in a different fashion. To understand repeat use lower shape for e.g. a(3,5,4) and b (2,3,5,4) and repeat on different axis.
Related
I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)
I would like to ask if there is a way how to reshape a dask array in Fortran-contiguous (column-major) order since the parallelized version of the np.reshape function is not supported yet (see here).
Fortran-contiguous (column-major) order is simply C-contiguous (row-major) order in reverse. So there's a simple work around for the fact that dask array doesn't support order='F':
Transpose your array to reverse its dimensions.
Reshape it to the reverse of your desired shape.
Transpose it back.
In a function:
def reshape_fortran(x, shape):
return x.T.reshape(shape[::-1]).T
Transposing with NumPy/dask is basically free (it doesn't copy any data), so in principle this operation should also be quite efficient.
Here's a simple test to verify it does the right thing:
In [48]: import numpy as np
In [49]: import dask.array as da
In [50]: x = np.arange(100).reshape(10, 10)
In [51]: y = da.from_array(x, chunks=5)
In [52]: shape = (2, 5, 10)
In [53]: np.array_equal(reshape_fortran(y, shape).compute(),
...: x.reshape(shape, order='F'))
...:
Out[53]: True
I have a Numpy array of shape (5,5,3,2). I want to take the element (1,4) of that matrix, which is also a matrix of shape (3,2), and add an element to it -so it becomes a (4,2) array.
The code I'm using is the following:
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype = object) #So I can have different size sub-matrices
a[2][3] = np.append(a[2][3],[[1.0,1.0]],axis=0) #a[2][3] shape = (3,2)
I'm always obtaining the error:
ValueError: could not broadcast input array from shape (4,2) into shape (3,2)
I understand that the shape returned by the np.append function is not the same as the a[2][3] sub-array, but I thought that the dtype=object would solve my problem. However, I need to do this. Is there any way to go around this limitation?
I also tried to use the insert function but I don't know how could I add the element in the place I want.
Make sure you understand what you have produced. That requires checking the shape and dtype, and possibly looking at the values
In [29]: a = np.random.rand(5,5,3,2)
In [30]: b=np.array(a, dtype=object)
In [31]: a.shape
Out[31]: (5, 5, 3, 2) # a is a 4d array
In [32]: a.dtype
Out[32]: dtype('float64')
In [33]: b.shape
Out[33]: (5, 5, 3, 2) # so is b
In [34]: b.dtype
Out[34]: dtype('O')
In [35]: b[2,3].shape
Out[35]: (3, 2)
In [36]: c=np.append(b[2,3],[[1,1]],axis=0)
In [37]: c.shape
Out[37]: (4, 2)
In [38]: c.dtype
Out[38]: dtype('O')
b[2][3] is also an array. b[2,3] is the proper numpy way of indexing 2 dimensions.
I suspect you wanted b to be a (5,5) array containing arrays (as objects), and you think that you you can simply replace one of those with a (4,2) array. But the b constructor simply changes the floats of a to objects, without changing the shape (or 4d nature) of b.
I could construct a (5,5) object array, and fill it with values from a. And then replace one of those values with a (4,2) array:
In [39]: B=np.empty((5,5),dtype=object)
In [40]: for i in range(5):
...: for j in range(5):
...: B[i,j]=a[i,j,:,:]
...:
In [41]: B.shape
Out[41]: (5, 5)
In [42]: B.dtype
Out[42]: dtype('O')
In [43]: B[2,3]
Out[43]:
array([[ 0.03827568, 0.63411023],
[ 0.28938383, 0.7951006 ],
[ 0.12217603, 0.304537 ]])
In [44]: B[2,3]=c
In [46]: B[2,3].shape
Out[46]: (4, 2)
This constructor for B is a bit crude. I've answered other questions about creating/filling object arrays, but I'm not going to take the time here to streamline this case. It's for illustration purposes only.
In an array of object, any element can be indeed an array (or any kind of object).
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype=object)
# Assign an 1D array to the array element ``a[2][3][0][0]``:
a[2][3][0][0] = np.arange(10)
a[2][3][0][0][9] # 9
However a[2][3] is not an array element, it is a whole array.
a[2][3].ndim # 2
Therefore when you do a[2][3] = (something) you are using broadcasting instead of assigning an element: numpy tries to replace the content of the subarray a[2][3] and fails because of shape mismatch. The memory layout of numpy arrays does not allow to change the shape of subarrays.
Edit: Instead of using numpy arrays you could use nested lists. These nested lists can have arbitrary sizes. Note that the memory is higher and that the access time is higher compared to numpy array.
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype=object)
b = np.append(a[2][3], [[1.0,1.0]],axis=0)
a_list = a.tolist()
a_list[2][3] = b.tolist()
The problem here, is that you try to assign to a[2][3]
Make a new array instead.
new_array = np.append(a[2][3],np.array([[1.0,1.0]]),axis=0)
I have a large 2D numpy matrix that needs to be made smaller (ex: convert from 100x100 to 10x10).
My goal is essentially: break the nxn matrix into smaller mxm matrices, average the cells in these mxm slices, and then construct a new (smaller) matrix out of these mxm slices.
I'm thinking about using something like matrix[a::b, c::d] to extract the smaller matrices, and then averaging those values, but this seems overly complex. Is there a better way to accomplish this?
You could split your array into blocks with the view_as_blocks function (in scikit-image).
For a 2D array, this returns a 4D array with the blocks ordered row-wise:
>>> import skimage.util as ski
>>> import numpy as np
>>> a = np.arange(16).reshape(4,4) # 4x4 array
>>> ski.view_as_blocks(a, (2,2))
array([[[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]]],
[[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]]])
Taking the mean along the last two axes returns a 2D array with the mean in each block:
>>> ski.view_as_blocks(a, (2,2)).mean(axis=(2,3))
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
Note: view_as_blocks returns a view of the array by modifying the strides (it also works with arrays with more than two dimensions). It is implemented purely in NumPy using as_strided, so if you don't have access to the scikit-image library you can copy the code from here.
Without ski-learn, you can simply reshape, and take the appropriate mean.
M=np.arange(10000).reshape(100,100)
M1=M.reshape(10,10,10,10)
M2=M1.mean(axis=(1,3))
quick check to see if I got the right axes
In [127]: M2[0,0]
Out[127]: 454.5
In [128]: M[:10,:10].mean()
Out[128]: 454.5
In [131]: M[-10:,-10:].mean()
Out[131]: 9544.5
In [132]: M2[-1,-1]
Out[132]: 9544.5
Adding .transpose([0,2,1,3]) puts the 2 averaging dimensions at the end, as view_as_blocks does.
For this (100,100) case, the reshape approach is 2x faster than the as_strided approach, but both are quite fast.
However the direct strided solution isn't much slower than reshaping.
as_strided(M,shape=(10,10,10,10),strides=(8000,80,800,8)).mean((2,3))
as_strided(M,shape=(10,10,10,10),strides=(8000,800,80,8)).mean((1,3))
I'm coming in late but I'd recommend scipy.ndimage.zoom() as an off-the-shelf solution for this. It does down-sizing (or upsizing) using spline interpolations of arbitrary order from 0 to 5. Sounds like order 0 would be sufficient for you based on your question.
from scipy import ndimage as ndi
import numpy as np
M=np.arange(1000000).reshape(1000,1000)
shrinkby=10
Mfilt = ndi.filters.uniform_filter(input=M, size=shrinkby)
Msmall = ndi.interpolation.zoom(input=Mfilt, zoom=1./shrinkby, order=0)
That's all you need. It's perhaps slightly less convenient to specify a zoom rather than a desired output size, but at least for order=0 this method is very fast.
The output size is 10% of the input in each dimension, i.e.
print M.shape, Msmall.shape
gives (1000, 1000) (100, 100) and the speed you can get from
%timeit Mfilt = ndi.filters.uniform_filter(input=M, size=shrinkby)
%timeit Msmall = ndi.interpolation.zoom(input=Mfilt, zoom=1./shrinkby, order=0)
which on my machine gave 10 loops, best of 3: 20.5 ms per loop for the uniform_filter call and 1000 loops, best of 3: 1.67 ms per loop for the zoom call.
I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)