Shuffling NumPy array along a given axis - python

Given the following NumPy array,
> a = array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])
it's simple enough to shuffle a single row,
> shuffle(a[0])
> a
array([[4, 2, 1, 3, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])
Is it possible to use indexing notation to shuffle each of the rows independently? Or do you have to iterate over the array. I had in mind something like,
> numpy.shuffle(a[:])
> a
array([[4, 2, 3, 5, 1],[3, 1, 4, 5, 2],[4, 2, 1, 3, 5]]) # Not the real output
though this clearly doesn't work.

Vectorized solution with rand+argsort trick
We could generate unique indices along the specified axis and index into the the input array with advanced-indexing. To generate the unique indices, we would use random float generation + sort trick, thus giving us a vectorized solution. We would also generalize it to cover generic n-dim arrays and along generic axes with np.take_along_axis. The final implementation would look something like this -
def shuffle_along_axis(a, axis):
idx = np.random.rand(*a.shape).argsort(axis=axis)
return np.take_along_axis(a,idx,axis=axis)
Note that this shuffle won't be in-place and returns a shuffled copy.
Sample run -
In [33]: a
Out[33]:
array([[18, 95, 45, 33],
[40, 78, 31, 52],
[75, 49, 42, 94]])
In [34]: shuffle_along_axis(a, axis=0)
Out[34]:
array([[75, 78, 42, 94],
[40, 49, 45, 52],
[18, 95, 31, 33]])
In [35]: shuffle_along_axis(a, axis=1)
Out[35]:
array([[45, 18, 33, 95],
[31, 78, 52, 40],
[42, 75, 94, 49]])

You have to call numpy.random.shuffle() several times because you are shuffling several sequences independently. numpy.random.shuffle() works on any mutable sequence and is not actually a ufunc. The shortest and most efficient code to shuffle all rows of a two-dimensional array a separately probably is
list(map(numpy.random.shuffle, a))
Some people prefer to write this as a list comprehension instead:
[numpy.random.shuffle(x) for x in a]

For those looking at this question more recently, numpy provides the permuted method to shuffle an array independently along the specified axis.
From their documentation (using random.Generator)
rng = np.random.default_rng()
x = np.arange(24).reshape(3, 8)
x
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23]])
y = rng.permuted(x, axis=1)
y
array([[ 4, 3, 6, 7, 1, 2, 5, 0],
[15, 10, 14, 9, 12, 11, 8, 13],
[17, 16, 20, 21, 18, 22, 23, 19]])

Related

Numpy Delete for 2-dimensional array

I have an ndarray of shape (10, 3) and an index list of length 10:
import numpy as np
arr = np.arange(10* 3).reshape((10, 3))
idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
I want to use numpy delete (or a numpy function that is suited better for the task) to delete the values in arr as indicated by idxs for each row. So in the zeroth row of arr I want to delete the 0th entry, in the first the first, in the second the first, and so on.
I tried something like
np.delete(arr, idxs, axis=1)
but it won't work. Then I tried building an index list like this:
idlist = [np.arange(len(idxs)), idxs]
np.delete(arr, idlist)
but this doesn't give me the results I want either.
#Quang's answer is good, but may benefit from some explanation.
np.delete works with whole rows or columns, not selected elements from each.
In [30]: arr = np.arange(10* 3).reshape((10, 3))
...: idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
Selecting items from the array is easy:
In [31]: arr[np.arange(10), idxs]
Out[31]: array([ 0, 4, 7, 10, 14, 15, 20, 23, 25, 27])
Selecting everything but these, takes a bit more work. np.delete is complex general code that does different things depending on the delete specification. But one thing it can do is create a True mask, and set the delete items to False.
For your 2d case we can:
In [33]: mask = np.ones(arr.shape, bool)
In [34]: mask[np.arange(10), idxs] = False
In [35]: arr[mask]
Out[35]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
boolean indexing produces a flat array, so we need to reshape to get 2d:
In [36]: arr[mask].reshape(10,2)
Out[36]:
array([[ 1, 2],
[ 3, 5],
[ 6, 8],
[ 9, 11],
[12, 13],
[16, 17],
[18, 19],
[21, 22],
[24, 26],
[28, 29]])
The Quand's answer creates the mask in another way:
In [37]: arr[np.arange(arr.shape[1]) != idxs[:,None]]
Out[37]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
Let's try extracting the other items by masking, then reshape:
arr[np.arange(arr.shape[1]) != idxs[:,None]].reshape(len(arr),-1)
Thanks for your question and the answers from Quang, and hpaulj.
I just want to add a second senario, where one wants to do the deletion from the other axis.
The index now has only 3 elements because there are only 3 columns in arr, for example:
idxs2 = np.array([1,2,3])
To delete the elements of each column according to the index in idxs2, one can do this
arr.T[np.array(np.arange(arr.shape[0]) != idxs2[:,None])].reshape(len(idxs2),-1).T
And the result becomes:
array([[ 0, 1, 2],
[ 6, 4, 5],
[ 9, 10, 8],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]])

How to add values at repeat index locations on multidimensional arrays of Numpy?

I have a matrix B with shape (6, 9) . And for every row of B, I want to add 1 at some column indices. The column indices may appear more than once, so I hope add m on one column if which index appear m times. Please see the following example codes:
import numpy as np
B = np.arange(6*9).reshape(6, 9)
idx = np.array([[0, 1, 2],
[6, 7, 0],
[2, 3, 4],
[4, 5, 6]], dtype=np.int)
B[:, idx] += 1 # the result is not what I want.
Furthermore, np.add.at and np.bincount also do not seem to work for above case.
I hope your help. Thanks very much.
More Information:
In idx array, index 0, 2 4 and 6 appear twice, so I want
B[:, [0, 2, 4, 6]] += 2. For other indices appeared once, just add 1. So the final B should be
B = np.array([[ 2, 2, 4, 4, 6, 6, 8, 8, 8],
[11, 11, 13, 13, 15, 15, 17, 17, 17],
[20, 20, 22, 22, 24, 24, 26, 26, 26],
[29, 29, 31, 31, 33, 33, 35, 35, 35],
[38, 38, 40, 40, 42, 42, 44, 44, 44],
[47, 47, 49, 49, 51, 51, 53, 53, 53]])
I think you can use np.add.at function to get what you want. Its syntax is
np.add.at('array', ('slice or array of indices for 1st dimension', 'slice or array of indices for 2nd dimension'), 'what to add')
So, in your case, if you want to add 1 for every row for every column, specified in idx, you should use
>>> a = np.arange(6 * 9).reshape(6, 9)
>>> np.add.at(a, (np.s_[:], idx), 1)
np.s_[:] is a slice object that tells us to perform it for each row

concatenate two 1d array into one 2d array and again break it after reshuffle in python

I have these two 1d arrays A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and its label L = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]; where L[i] is the label of A[i].
Objective : I need to randomly shuffle both the 1d arrays in such a way that their labels stay in the same index.
e.g: After shuffle:
A= [2, 4, 9, 1, 3, 6, 0, 7, 5] then
L= [7, 5, 0, 8, 6, 3, 9, 2, 4], A[i] and L[i] should remain same as the original one.
I was thinking of concatenating the above two 1d arrays into a single 2d array and reshuffle it, then again separate the two 1d arrays. It's not working. And I am stuck at reshuffle.
Below is the code that I tried
import numpy as np
import random
# initializing the contents
A = np.arange(0,10)
length= len(A)
print length
print A
labels = np.zeros(10)
for index in range(length):
labels[index] = A[length-index-1]
print labels
# end, contents ready
combine = []
combine.append([A, labels])
print combine
random.shuffle(combine)
print "After shuffle"
print combine
If you are using Numpy just use a numpythonic approach. Create the pairs using np.column_stack and shuffle them with numpy.random.shuffle function:
pairs = np.column_stack((A, L))
np.random.shuffle(pairs)
Demo:
In [16]: arr = np.column_stack((A, L))
In [17]: np.random.shuffle(arr)
In [18]: arr
Out[18]:
array([[4, 5],
[5, 4],
[7, 2],
[1, 8],
[3, 6],
[6, 3],
[8, 1],
[2, 7],
[9, 0],
[0, 9]])
If you want to get the arrays just do a simple indexing:
In [19]: arr[:,0]
Out[19]: array([4, 5, 7, 1, 3, 6, 8, 2, 9, 0])
In [20]: arr[:,1]
Out[20]: array([5, 4, 2, 8, 6, 3, 1, 7, 0, 9])
Your thought was in the right direction. You just needed some Python-Fu:
from random import shuffle
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
L = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
res = list(zip(A, L))
shuffle(res) # shuffles in-place!
A, L = zip(*res) # unzip
print(A) # -> (4, 0, 2, 1, 8, 7, 9, 6, 5, 3)
print(L) # -> (5, 9, 7, 8, 1, 2, 0, 3, 4, 6)
The unzipping operation is explained here in detail in case you are wondering how it works.
You can also keep an index array np.arange(size) where size is the length of A and L and do shuffling on this array. Then use this array to rearrange A and L.
idx = np.arange(10)
np.random.shuffle(idx) # or idx = np.random.shuffle(np.arange(10))
A = np.arange(100).reshape(10, 10)
L = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
L[idx], A[idx]
# output
(array([2, 5, 1, 7, 8, 9, 0, 6, 4, 3]),
array([[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]]))
Reference
Numpy: Rearrange array based upon index array

How to delete multiple rows of NumPy array?

How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)

What does transpose(3, 0, 1, 2) mean?

What does this mean?
data.transpose(3, 0, 1, 2)
Also, if data.shape == (10, 10, 10), why do I get ValueError: axes don't match array?
Let me discuss in terms of Python3.
I use the transpose function in python as data.transpose(3, 0, 1, 2)
This is wrong as this operation requires 4 dimensions, while you only provide 3 (as in (10,10,10)). Reproducible as:
>>> a = np.arange(60).reshape((1,4,5,3))
>>> b = a.transpose((2,0,1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: axes don't match array
You can either add another dimension simply by reshaping (10,10,10) to (1,10,10,10) if the image batch is 1. This can be done as:
w,h,c = original_image.shape #10,10,10
modified_img = np.reshape((1,w,h,c)) #(1,10,10,10)
what does it mean of 3, 0, 1, 2.
For 2D numpy arrays, transpose for an array (matrix) operates just as the names say. But for higher dimensional arrays like yours, it basically works as moveaxis.
>>> a = np.arange(60).reshape((4,5,3))
>>> b = a.transpose((2,0,1))
>>> b.shape
(3, 4, 5)
>>> c = np.moveaxis(a,-1,0)
>>> c.shape
(3, 4, 5)
>>> b
array([[[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57]],
[[ 1, 4, 7, 10, 13],
[16, 19, 22, 25, 28],
[31, 34, 37, 40, 43],
[46, 49, 52, 55, 58]],
[[ 2, 5, 8, 11, 14],
[17, 20, 23, 26, 29],
[32, 35, 38, 41, 44],
[47, 50, 53, 56, 59]]])
>>> c
array([[[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57]],
[[ 1, 4, 7, 10, 13],
[16, 19, 22, 25, 28],
[31, 34, 37, 40, 43],
[46, 49, 52, 55, 58]],
[[ 2, 5, 8, 11, 14],
[17, 20, 23, 26, 29],
[32, 35, 38, 41, 44],
[47, 50, 53, 56, 59]]])
As evident, both methods work the same.
The operation converts from (samples, rows, columns, channels) into (samples, channels, rows, cols),maybe opencv to pytorch.
Have a look at numpy.transpose
Use transpose(a, argsort(axes)) to invert the transposition of tensors
when using the axes keyword argument.
Transposing a 1-D array returns an unchanged view of the original
array.
e.g.
>>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
[2, 3]])
>>>
>>> np.transpose(x)
array([[0, 2],
[1, 3]])
You specified too many values in the transpose
>>> a = np.arange(8).reshape(2,2,2)
>>> a.shape (2, 2, 2)
>>> a.transpose([2,0,1])
array([[[0, 2],
[4, 6]],
[[1, 3],
[5, 7]]])
>>> a.transpose(3,0,1,2) Traceback (most recent call last): File "<interactive input>", line 1, in <module> ValueError: axes don't match array
>>>
From the python documentation on np.transpose, the second argument of the np.transpose function is axes, which is a list of ints, optional
by default and reverse the dimensions, otherwise permute the axes
according to the values given.
Example :
>>> x = np.arange(9).reshape((3,3))
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.transpose(x, (0,1))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.transpose(x, (1,0))
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
The thing is you have taken a 3 dimensional matrix and applied a 4 dimensional transpose.
your command is to convert a 4d matrix(batch,rows,cols,channel) to another 4d matrix (rows,cols,channel,batch) but you need a command to convert 3d matrix.so remove 3 and write
data.transpose(2, 0, 1).
For all i, j, k, l, the following holds true:
arr[i, j, k, l] == arr.transpose(3, 0, 1, 2)[l, i, j, k]
transpose(3, 0, 1, 2) reorders the array dimensions from (a, b, c, d) to (d, a, b, c):
>>> arr = np.zeros((10, 11, 12, 13))
>>> arr.transpose(3, 0, 1, 2).shape
(13, 10, 11, 12)

Categories

Resources