Numpy array loses shape after applying mask across axis

Numpy array loses shape after applying mask across axis - python

Problem
I have np.array and mask which are of the same shape. Once I apply the mask, the array loses it shape and becomes 1D - flattened one dimensional.
Question
I am wanting to reduce my array across some axis, based on a mask of axis length 1D.
How can I apply a mask, but keep dimensionality of the array?
Example
A small example in code:
# data ...
>>> data = np.ones((4, 4))
>>> data.shape
(4, 4)
# mask ...
>>> mask = np.ones((4, 4), dtype=bool)
>>> mask.shape
(4, 4)
# apply mask ...
>>> data[mask].shape
(16,)
My ideal shape would be (4, 4).
An example with array dimension reduction across an axis:
# data, mask ...
>>> data = np.ones((4, 4))
>>> mask = np.ones((4, 4), dtype=bool)
# remove last column from data ...
>>> mask[:, 3] = False
>>> mask
array([[ True, True, True, False],
[ True, True, True, False],
[ True, True, True, False],
[ True, True, True, False]])
# equivalent mask in 1D ...
>>> mask[0]
array([ True, True, True, False])
# apply mask ...
>>> data[mask].shape
(12,)
The ideal dimensions of the array would be (4, 3) without reshape.
Help is appreciated, thanks!

The 'correct' way of achieving your goal is to not expand the mask to 2D. Instead index with [:, mask] with the 1D mask. This indicates to numpy that you want axis 0 unchanged and mask applied along axis 1.
a = np.arange(12).reshape(3, 4)
b = np.array((1,0,1,0),'?')
a
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
b
# array([ True, False, True, False])
a[:, b]
# array([[ 0, 2],
# [ 4, 6],
# [ 8, 10]])
If your mask is already 2D, numpy won't check whether all its rows are the same because that would be inefficient. But obviously you can use [:, mask[0]] in that case.
If your mask is 2D and just happens to have the same number of Trues in each row then either use #tel's answer. Or create an index array:
B = b^b[:3, None]
B
# array([[False, True, False, True],
# [ True, False, True, False],
# [False, True, False, True]])
J = np.where(B)[1].reshape(len(B), -1)
And now either
np.take_along_axis(a, J, 1)
# array([[ 1, 3],
# [ 4, 6],
# [ 9, 11]])
or
I = np.arange(len(J))[:, None]
IJ = I, J
a[IJ]
# #array([[ 1, 3],
# [ 4, 6],
# [ 9, 11]])

I believe what you want can be done by calling new_data.reshape(837, -1). Here's a brief example:
arr = np.arange(8*6).reshape(8,6)
maskpiece = np.array([True, False]*3)
mask = np.broadcast_to(maskpiece, (8,6))
print('the original array\n%s\n' % arr)
print('the flat masked array\n%s\n' % arr[mask])
print('the masked array reshaped into 2D\n%s\n' % arr[mask].reshape(8, -1))
Output:
the original array
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]
[36 37 38 39 40 41]
[42 43 44 45 46 47]]
the flat masked array
[ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46]
the masked array reshaped into 2D
[[ 0 2 4]
[ 6 8 10]
[12 14 16]
[18 20 22]
[24 26 28]
[30 32 34]
[36 38 40]
[42 44 46]]

Related

Explanation of boolean indexing behaviors

For the 2D array y:
y = np.arange(20).reshape(5,4)
---
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
All indexing select 1st, 3rd, and 5th rows. This is clear.
print(y[
[0, 2, 4],
::
])
print(y[
[0, 2, 4],
::
])
print(y[
[True, False, True, False, True],
::
])
---
[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
Questions
Please help understand what rules or mechanism are working to produce the results.
Replacing [] with tuple produces an empty array with shape (0, 5, 4).
y[
(True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)
Use single True adds a new axis.
y[True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True].shape
---
(1, 5, 4)
Adding additional boolean True produces the same.
y[True, True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True, True].shape
---
(1, 5, 4)
However, adding False boolean causes the empty array again.
y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)
Not sure the documentation explains this behavior.
Boolean array indexing
In general if an index includes a Boolean array, the result will be
identical to inserting obj.nonzero() into the same position and using
the integer array indexing mechanism described above. x[ind_1,
boolean_array, ind_2] is equivalent to x[(ind_1,) +
boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array
present, this is straight forward. Care must only be taken to make
sure that the boolean index has exactly as many dimensions as it is
supposed to work with.

Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:
/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/
So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.
This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:
x = np.ones((2, 2))
assert x[x > 0].ndim == 1
x = np.ones(2)
assert x[x > 0].ndim == 1
x = np.ones(())
assert x[x > 0].ndim == 1 # scalar boolean here!
The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.
Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

Selecting vector of 2D array elements from column index vector

I have a 2D array A:
28 39 52
77 80 66
7 18 24
9 97 68
And a vector array of column indexes B:
1
0
2
0
How, in a pythonian way, using base Python or Numpy, can I select the elements from A which DO NOT correspond to the column indexes in B?
I should get this 2D array which contains the elements of A, Not corresponding to the column indexes stored in B:
28 52
80 66
7 18
97 68

You can make use of broadcasting and a row-wise mask to select elements not contained in your array for each row:
Setup
B = np.array([1, 0, 2, 0])
cols = np.arange(A.shape[1])
Now use broadcasting to create a mask, and index your array.
mask = B[:, None] != cols
A[mask].reshape(-1, 2)
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])

A spin off of my answer to your other question,
Replace 2D array elements with zeros, using a column index vector
We can make a boolean mask with the same indexing used before:
In [124]: mask = np.ones(A.shape, dtype=bool)
In [126]: mask[np.arange(4), B] = False
In [127]: mask
Out[127]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
Indexing an array with a boolean mask produces a 1d array, since in the most general case such a mask could select a different number of elements in each row.
In [128]: A[mask]
Out[128]: array([28, 52, 80, 66, 7, 18, 97, 68])
In this case the result can be reshaped back to 2d:
In [129]: A[mask].reshape(4,2)
Out[129]:
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
Since you allowed for 'base Python' here's list comprehension answer:
In [136]: [[y for i,y in enumerate(x) if i!=b] for b,x in zip(B,A)]
Out[136]: [[28, 52], [80, 66], [7, 18], [97, 68]]
If all the 0's in the other A come from the insertion, then we can also get the mask (Out[127]) with
In [142]: A!=0
Out[142]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])

NumPy random shuffle rows independently

I have the following array:
import numpy as np
a = np.array([[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3]])
I understand that np.random.shuffle(a.T) will shuffle the array along the row, but what I need is for it to shuffe each row idependently. How can this be done in numpy? Speed is critical as there will be several million rows.
For this specific problem, each row will contain the same starting population.

import numpy as np
np.random.seed(2018)
def scramble(a, axis=-1):
"""
Return an array with the values of `a` independently shuffled along the
given axis
"""
b = a.swapaxes(axis, -1)
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
b = b[..., idx]
return b.swapaxes(axis, -1)
a = a = np.arange(4*9).reshape(4, 9)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
# [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23, 24, 25, 26],
# [27, 28, 29, 30, 31, 32, 33, 34, 35]])
print(scramble(a, axis=1))
yields
[[ 3 8 7 0 4 5 1 2 6]
[12 17 16 9 13 14 10 11 15]
[21 26 25 18 22 23 19 20 24]
[30 35 34 27 31 32 28 29 33]]
while scrambling along the 0-axis:
print(scramble(a, axis=0))
yields
[[18 19 20 21 22 23 24 25 26]
[ 0 1 2 3 4 5 6 7 8]
[27 28 29 30 31 32 33 34 35]
[ 9 10 11 12 13 14 15 16 17]]
This works by first swapping the target axis with the last axis:
b = a.swapaxes(axis, -1)
This is a common trick used to standardize code which deals with one axis.
It reduces the general case to the specific case of dealing with the last axis.
Since in NumPy version 1.10 or higher swapaxes returns a view, there is no copying involved and so calling swapaxes is very quick.
Now we can generate a new index order for the last axis:
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
Now we can shuffle b (independently along the last axis):
b = b[..., idx]
and then reverse the swapaxes to return an a-shaped result:
return b.swapaxes(axis, -1)

If you don't want a return value and want to operate on the array directly, you can specify the indices to shuffle.
>>> import numpy as np
>>>
>>>
>>> a = np.array([[1,2,3], [1,2,3], [1,2,3]])
>>>
>>> # Shuffle row `2` independently
>>> np.random.shuffle(a[2])
>>> a
array([[1, 2, 3],
[1, 2, 3],
[3, 2, 1]])
>>>
>>> # Shuffle column `0` independently
>>> np.random.shuffle(a[:,0])
>>> a
array([[3, 2, 3],
[1, 2, 3],
[1, 2, 1]])
If you want a return value as well, you can use numpy.random.permutation, in which case replace np.random.shuffle(a[n]) with a[n] = np.random.permutation(a[n]).
Warning, do not do a[n] = np.random.shuffle(a[n]). shuffle does not return anything, so the row/column you end up "shuffling" will be filled with nan instead.

Good answer above. But I will throw in a quick and dirty way:
a = np.array([[1,2,3], [1,2,3], [1,2,3]])
ignore_list_outpput = [np.random.shuffle(x) for x in a]
Then, a can be something like this
array([[2, 1, 3],
[4, 6, 5],
[9, 7, 8]])
Not very elegant but you can get this job done with just one short line.

Building on my comment to #Hun's answer, here's the fastest way to do this:
def shuffle_along(X):
"""Minimal in place independent-row shuffler."""
[np.random.shuffle(x) for x in X]
This works in-place and can only shuffle rows. If you need more options:
def shuffle_along(X, axis=0, inline=False):
"""More elaborate version of the above."""
if not inline:
X = X.copy()
if axis == 0:
[np.random.shuffle(x) for x in X]
if axis == 1:
[np.random.shuffle(x) for x in X.T]
if not inline:
return X
This, however, has the limitation of only working on 2d-arrays. For higher dimensional tensors, I would use:
def shuffle_along(X, axis=0, inline=True):
"""Shuffle along any axis of a tensor."""
if not inline:
X = X.copy()
np.apply_along_axis(np.random.shuffle, axis, X) # <-- I just changed this
if not inline:
return X

You can do it with numpy without any loop or extra function, and much more faster. E. g., we have an array of size (2, 6) and we want a sub array (2,2) with independent random index for each column.
import numpy as np
test = np.array([[1, 1],
[2, 2],
[0.5, 0.5],
[0.3, 0.3],
[4, 4],
[7, 7]])
id_rnd = np.random.randint(6, size=(2, 2)) # select random numbers, use choice and range if don want replacement.
new = np.take_along_axis(test, id_rnd, axis=0)
Out:
array([[2. , 2. ],
[0.5, 2. ]])
It works for any number of dimensions.

As of NumPy 1.20.0 released in January 2021 we have a permuted() method on the new Generator type (introduced with the new random API in NumPy 1.17.0, released in July 2019). This does exactly what you need:
import numpy as np
rng = np.random.default_rng()
a = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
])
shuffled = rng.permuted(a, axis=1)
This gives you something like
>>> print(shuffled)
[[2 3 1]
[1 3 2]
[2 1 3]]
As you can see, the rows are permuted independently. This is in sharp contrast with both rng.permutation() and rng.shuffle().
If you want an in-place update you can pass the original array as the out keyword argument. And you can use the axis keyword argument to choose the direction along which to shuffle your array.

Mask 2D numpy array

I want to apply mask on 2D numpy array. But it does not work correctly. Suppose I have
val(lat, lon) ---> my 2D array (20, 30)
Mask_lat = np.ma.masked_array(lat, mask=latmask) ---> masked lat (5,)
Mask_lon = np.ma.masked_array(lon, mask =lonmask) ---> masked lon (8,)
Maks_val = np.ma.masked_array(val, mask=mask_lat_lon) ---> ?
I do not know how can I pass a correct mask_lat_lon to have masked val (5,8). I would appreciate if one guides me.
Thank you in advance.

If I understand your question correctly, you have two 1D arrays that represent y and x (lat and long) positions in a 2D array. You want to mask a region based on the x/y position in the 2D array.
The key part to understand is that mask for a 2D array is also 2D.
For example, let's mask a single element of a 2D array:
import numpy as np
z = np.arange(20).reshape(5, 4)
mask = np.zeros(z.shape, dtype=bool)
mask[3, 2] = True
print z
print np.ma.masked_array(z, mask)
This yields:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 -- 15]
[16 17 18 19]]
In your case, you have two 1D x and y arrays that you need to create a 2D mask from. For example:
import numpy as np
x = np.linspace(-85, -78, 4)
y = np.linspace(32, 37, 5)
z = np.arange(20).reshape(5, 4)
xmask = (x > -82.6) & (x < -80)
ymask = (y > 33) & (y < 35.6)
print xmask
print ymask
We'd then need to combine them into a single 2D mask using broadcasting:
mask = xmask[np.newaxis, :] & ymask[:, np.newaxis]
Slicing with newaxis (or None, they're the same object) adds a new axis at that position, turning the 1D array into a 2D array. It you have seen this before, it's useful to take a quick look at what xmask[np.newaxis, :] and ymask[:, np.newaxis] look like:
In [14]: xmask
Out[14]: array([False, False, True, False], dtype=bool)
In [15]: ymask
Out[15]: array([False, True, True, False, False], dtype=bool)
In [16]: xmask[np.newaxis, :]
Out[16]: array([[False, False, True, False]], dtype=bool)
In [17]: ymask[:, np.newaxis]
Out[17]:
array([[False],
[ True],
[ True],
[False],
[False]], dtype=bool)
mask will then be (keep in mind that True elements are masked):
In [18]: xmask[np.newaxis, :] & ymask[:, np.newaxis]
Out[18]:
array([[False, False, False, False],
[False, False, True, False],
[False, False, True, False],
[False, False, False, False],
[False, False, False, False]], dtype=bool)
Finally, we can create a 2D masked array from z based on this mask:
arr = np.masked_array(z, mask)
Which gives us our final result:
[[ 0 1 2 3]
[ 4 5 -- 7]
[ 8 9 -- 11]
[12 13 14 15]
[16 17 18 19]]

Form a big 2d array from multiple smaller 2d arrays

The question is the inverse of this question. I'm looking for a generic method to from the original big array from small arrays:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]]])
->
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
I am currently developing a solution, will post it when it's done, would however like to see other (better) ways.

import numpy as np
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array looks like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
def unblockshaped(arr, h, w):
"""
Return an array of shape (h, w) where
h * w = arr.size
If arr is of shape (n, nrows, ncols), n sublocks of shape (nrows, ncols),
then the returned array preserves the "physical" layout of the sublocks.
"""
n, nrows, ncols = arr.shape
return (arr.reshape(h//nrows, -1, nrows, ncols)
.swapaxes(1,2)
.reshape(h, w))
For example,
c = np.arange(24).reshape((4,6))
print(c)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
print(blockshaped(c, 2, 3))
# [[[ 0 1 2]
# [ 6 7 8]]
# [[ 3 4 5]
# [ 9 10 11]]
# [[12 13 14]
# [18 19 20]]
# [[15 16 17]
# [21 22 23]]]
print(unblockshaped(blockshaped(c, 2, 3), 4, 6))
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
Note that there is also superbatfish's
blockwise_view. It arranges the
blocks in a different format (using more axes) but it has the advantage of (1)
always returning a view and (2) being capable of handing arrays of any
dimension.

Yet another (simple) approach:
threedarray = ...
twodarray = np.array(map(lambda x: x.flatten(), threedarray))
print(twodarray.shape)

I hope I get you right, let's say we have a,b :
>>> a = np.array([[1,2] ,[3,4]])
>>> b = np.array([[5,6] ,[7,8]])
>>> a
array([[1, 2],
[3, 4]])
>>> b
array([[5, 6],
[7, 8]])
in order to make it one big 2d array use numpy.concatenate:
>>> c = np.concatenate((a,b), axis=1 )
>>> c
array([[1, 2, 5, 6],
[3, 4, 7, 8]])

It works for the images I tested for now. Will if further tests are made. It is however a solution which takes no account about speed and memory usage.
def unblockshaped(blocks, h, w):
n, nrows, ncols = blocks.shape
bpc = w/ncols
bpr = h/nrows
reconstructed = zeros((h,w))
t = 0
for i in arange(bpr):
for j in arange(bpc):
reconstructed[i*nrows:i*nrows+nrows,j*ncols:j*ncols+ncols] = blocks[t]
t = t+1
return reconstructed

Here is a solution that one can use if someone is wishing to create tiles of a matrix:
from itertools import product
import numpy as np
def tiles(arr, nrows, ncols):
"""
If arr is a 2D array, the returned list contains nrowsXncols numpy arrays
with each array preserving the "physical" layout of arr.
When the array shape (rows, cols) are not divisible by (nrows, ncols) then
some of the array dimensions can change according to numpy.array_split.
"""
rows, cols = arr.shape
col_arr = np.array_split(range(cols), ncols)
row_arr = np.array_split(range(rows), nrows)
return [arr[r[0]: r[-1]+1, c[0]: c[-1]+1]
for r, c in product(row_arr, col_arr)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy array loses shape after applying mask across axis - python

Related

Explanation of boolean indexing behaviors

Selecting vector of 2D array elements from column index vector

NumPy random shuffle rows independently

Mask 2D numpy array

Form a big 2d array from multiple smaller 2d arrays

Categories

Resources