How to randomly shuffle blocks in numpy 3D array on particular axis - python

I have a 3D numpy array and I want to shuffle it block wise in a particular axis while keeping the data in that block in it's original state. For instance I have an np array of shape (50, 140, 23) and I want to shuffle by making blocks of (50, 1, 23) on axis=1. So 140 blocks will be created and blocks should be shuffled on axis=1 while maintaining the data in blocks in it's original order. I read documentation about np.random.shuffle(x) but this only shuffles in first axis and we can't provide a block size to it.
Is there any function in numpy or a quick way to do this?

You can use a random permutation:
A = sum(np.ogrid[0:0:50j,:140,0:0:23j])
rng = np.random.default_rng()
Ashuff = A[:,rng.permutation(140),:]

Perhaps swapping axis, shuffling and swapping back might do the trick for you?
a = np.random.random((50,140,23))
b = np.swapaxes(a, 0, 1)
np.random.shuffle(b)
c = np.swapaxes(b, 0, 1)

Related

Generate bootstrap sample from ndarray

Is there a way to generate a bootstrap sample on an N-dimensional array? I am limited to using numpy==1.19.4
I have already tried using a for loop on the other dimensions to no avail, but the following works for 1-dimensional arrays.
import numpy as np
# Set random state and number of resamples
random.seed(random_state)
n_resamples = 9999
# Generate data
data_1d = np.arange(2, 3, 0.1)
data_nd = np.random.default_rng(42).random((2,3,2))
data = data_1d.copy()
# Resample the data with replacement, computing the test statistic for each set of resamples
bs_samples = [np.std(np.random.choice(data, size=len(data))) for _ in range(n_resamples)]
If I get your problem, I use to apply this method:
suppose you have this multi-dimensionale array:
data_nd = np.random.rand(100, 3, 2)
data_nd.shape #(100, 3, 2)
you can sample elements with bootstrap in this way:
n_resamples = 99
data_nd[np.random.randint(len(data_nd), size=len(data_nd)*n_resamples)].reshape(n_resamples, *data_nd.shape).shape
what I'm doing is to randomly extract indices (randint) with replacement and finally reshape the sampling to obtain 99 bootstrapped dataset with the same dimensions of the original one.
Note that by this procedure you are considering as "elements" the arrays along the first ax and so each element that you are sampling have shape (3,2).
I hope that is clear, but if you have any doubt please let me know.

Converting a 2D segmentation map to n dimensional numpy masks

I have a segmentation map with 10 classes (A numpy array of size (m,n,1) which every element is a number from 1~10 specifying a class that the pixel belongs to). I want to convert it to an array of size (m,n,10) where each channel is mask for elements of that specific class. I can do it using a for loop like this:
for i in range(10):
mask[:,:,i] = (seg_map==i)[:,:,0]
but I need a faster way to do this. The for loop takes too much time. Is there any built in function that can outperform the for loop.
Thanks in advance.
One approach:
import numpy as np
np.random.seed(42)
# toy data
data = np.random.randint(0, 10, 20).reshape((5, 4, 1))
# https://stackoverflow.com/a/37323404/4001592
n_values = 10
values = data.flatten()
encoded = np.eye(n_values)[data.ravel()].reshape((5, 4, 10))
match = np.allclose(data.reshape(5, 4), encoded.argmax(-1))
print(match)
One way to verify that the output is correct is to verify that the one-hot encoded value matches back with the index, as below:
match = np.allclose(data.reshape(5, 4), encoded.argmax(-1))
print(match)
Output
True

Selecting last column of a Numpy array while maintaining then umber of dimensions? [duplicate]

I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)

python - repeating numpy array without replicating data

This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.
How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.
More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.
One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:
import numpy as np
M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)
# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False
To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:
M_strided = np.lib.stride_tricks.as_strided(
M, # input array
(700, M.shape[0], M.shape[1]), # output dimensions
(0, M.strides[0], M.strides[1]) # stride length in bytes
)

Numpy index slice without losing dimension information

I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)

Categories

Resources