Boolean masking on multiple axes with numpy - python

I want to apply boolean masking both to rows and columns.
With
X = np.array([[1,2,3],[4,5,6]])
mask1 = np.array([True, True])
mask2 = np.array([True, True, False])
X[mask1, mask2]
I expect the output to be
array([[1,2],[4,5]])
instead of
array([1,5])
It's known that
X[:, mask2]
can be used here but that's not a solution for the general case.
I would like to know how it works under the hood and why in this case the result is array([1,5]).

X[mask1, mask2] is described in Boolean Array Indexing Doc as the equivalent of
In [249]: X[mask1.nonzero()[0], mask2.nonzero()[0]]
Out[249]: array([1, 5])
In [250]: X[[0,1], [0,1]]
Out[250]: array([1, 5])
In effect it is giving you X[0,0] and X[1,1] (pairing the 0s and 1s).
What you want instead is:
In [251]: X[[[0],[1]], [0,1]]
Out[251]:
array([[1, 2],
[4, 5]])
np.ix_ is a handy tool for creating the right mix of dimensions
In [258]: np.ix_([0,1],[0,1])
Out[258]:
(array([[0],
[1]]), array([[0, 1]]))
In [259]: X[np.ix_([0,1],[0,1])]
Out[259]:
array([[1, 2],
[4, 5]])
That's effectively a column vector for the 1st axis and row vector for the second, together defining the desired rectangle of values.
But trying to broadcast boolean arrays like this does not work: X[mask1[:,None], mask2]
But that reference section says:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
In [260]: X[np.ix_(mask1, mask2)]
Out[260]:
array([[1, 2],
[4, 5]])
In [261]: np.ix_(mask1, mask2)
Out[261]:
(array([[0],
[1]], dtype=int32), array([[0, 1]], dtype=int32))
The boolean section of ix_:
if issubdtype(new.dtype, _nx.bool_):
new, = new.nonzero()
So it works with a mix like X[np.ix_(mask1, [0,2])]

One solution would be to use sequential integer indexing and getting the integers for example from np.where:
>>> X[:, np.where(mask1)[0]][np.where(mask2)[0]]
array([[1, 2],
[4, 5]])
or as #user2357112 pointed out in the comments np.ix_ could be used as well. For example:
>>> X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]
array([[1, 2],
[4, 5]])
Another idea would be to broadcast your masks and then do it in one step would require a reshape afterwards:
>>> X[np.where(mask1[:, None] * mask2)]
array([1, 2, 4, 5])
>>> X[np.where(mask1[:, None] * mask2)].reshape(2, 2)
array([[1, 2],
[4, 5]])

In a more general sense, your question is bout finding the subpart of an array containing certain rows and columns.
main_array = np.array([[1,2,3],[4,5,6]])
mask_ax_0 = np.array([True, True]) # about which rows, i want
mask_ax_1 = np.array([True, True, False]) # which columns, i want
Answer:
mask_2d = np.logical_and(mask_ax_0.reshape(-1,1), mask_ax_1.reshape(1,-1))
sub_array = main_array[mask_2d].reshape(np.sum(mask_ax_0), np.sum(mask_ax_1))
print(sub_array)

You should be using the numpy.ma module.
In particular, you could use mask_rowcols :
X = np.array([[1,2,3],[4,5,6]])
linesmask = np.array([True, True])
colsmask = np.array([True, True, False])
X = X.view(ma.MaskedArray)
for i in range(len(linesmask)):
X.mask[i][0] = not linemask[i]
for j in range(len(colsmask)):
X.mask[0][j] = not colsmask[j]
X = ma.mask_rowcols(X)

Related

Deleting multiple rows and columns together in arrays in Python [duplicate]

Say I have an array like this:
x = [1, 2, 3]
[4, 5, 6]
[7, 8, 9]
And I want to delete both the ith row and column. So if i=1, I'd create (with 0-indexing):
[1, 3]
[7, 9]
Is there an easy way of doing this with a one-liner? I know I can call np.delete() twice, but that seems a little unclean.
It'd be exactly equivalent to np.delete(np.delete(x, idx, 0), idx, 1), where idx is the index of the row/column pair to delete - it'd just look cleaner.
In [196]: x = np.arange(1,10).reshape(3,3)
If you look at np.delete code, you'll see that it's python (not compiled) and takes different approaches depending on how the delete values are specified. One is to make a res array of right size, and copy two slices to it.
Another is to make a boolean mask. For example:
In [197]: mask = np.ones(x.shape[0], bool)
In [198]: mask[1] = 0
In [199]: mask
Out[199]: array([ True, False, True])
Since you are deleting the same row and column, use this indexing:
In [200]: x[mask,:][:,mask]
Out[200]:
array([[1, 3],
[7, 9]])
A 1d boolean mask like this can't be 'broadcasted' in the same ways a integer array can.
We can do the 2d advanced indexing with:
In [201]: idx = np.nonzero(mask)[0]
In [202]: idx
Out[202]: array([0, 2])
In [203]: np.ix_(idx,idx)
Out[203]:
(array([[0],
[2]]),
array([[0, 2]]))
In [204]: x[np.ix_(idx,idx)]
Out[204]:
array([[1, 3],
[7, 9]])
Actually ix_ can work directly from the boolean array(s):
In [207]: np.ix_(mask,mask)
Out[207]:
(array([[0],
[2]]),
array([[0, 2]]))
This isn't a one-liner, but it probably is faster than the double delete, since it strips off all the extra baggage that the more general function requires.
This can be easily achieved by numpy's delete function. It would be:
arr = np.delete(arr, index, 0) # deletes the desired row
arr = np.delete(arr, index, 1) # deletes the desired column at index
The third argument is the axis.

Delete both row and column in numpy array

Say I have an array like this:
x = [1, 2, 3]
[4, 5, 6]
[7, 8, 9]
And I want to delete both the ith row and column. So if i=1, I'd create (with 0-indexing):
[1, 3]
[7, 9]
Is there an easy way of doing this with a one-liner? I know I can call np.delete() twice, but that seems a little unclean.
It'd be exactly equivalent to np.delete(np.delete(x, idx, 0), idx, 1), where idx is the index of the row/column pair to delete - it'd just look cleaner.
In [196]: x = np.arange(1,10).reshape(3,3)
If you look at np.delete code, you'll see that it's python (not compiled) and takes different approaches depending on how the delete values are specified. One is to make a res array of right size, and copy two slices to it.
Another is to make a boolean mask. For example:
In [197]: mask = np.ones(x.shape[0], bool)
In [198]: mask[1] = 0
In [199]: mask
Out[199]: array([ True, False, True])
Since you are deleting the same row and column, use this indexing:
In [200]: x[mask,:][:,mask]
Out[200]:
array([[1, 3],
[7, 9]])
A 1d boolean mask like this can't be 'broadcasted' in the same ways a integer array can.
We can do the 2d advanced indexing with:
In [201]: idx = np.nonzero(mask)[0]
In [202]: idx
Out[202]: array([0, 2])
In [203]: np.ix_(idx,idx)
Out[203]:
(array([[0],
[2]]),
array([[0, 2]]))
In [204]: x[np.ix_(idx,idx)]
Out[204]:
array([[1, 3],
[7, 9]])
Actually ix_ can work directly from the boolean array(s):
In [207]: np.ix_(mask,mask)
Out[207]:
(array([[0],
[2]]),
array([[0, 2]]))
This isn't a one-liner, but it probably is faster than the double delete, since it strips off all the extra baggage that the more general function requires.
This can be easily achieved by numpy's delete function. It would be:
arr = np.delete(arr, index, 0) # deletes the desired row
arr = np.delete(arr, index, 1) # deletes the desired column at index
The third argument is the axis.

using boolean array for indexing in numpy for 2D arrays

I use boolean indexing to select elements from a numpy array as
x = y[t<tmax]
where t a numpy array with as many elements as y. My question is how can I do the same with 2D numpy arrays? I tried
x = y[t<tmax][t<tmax]
This does not seem to work however since it seems to select first the rows and then complains that the second selection has the wrong dimension.
IndexError: boolean index did not match indexed array along dimension 0; dimension is 50 but corresponding boolean dimension is 200
#
Here is an example
x1D = np.array([1,2,3], np.int32)
x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
print(x1D[x1D<3]) --> [1 2]
print(x2D[x1D<3][x1D<3]) --> error
The second print statement produces an error similar to the error shown above. I use
print(x2D[x1D<3])
I get
[[1 2 3]
[1 2 3]]
but I want
[[1 2]
[1 2]]
In [28]: x1D = np.array([1,2,3], np.int32)
...: x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
The 1d mask:
In [29]: x1D<3
Out[29]: array([ True, True, False])
applied to the 1d array (same size):
In [30]: x1D[_]
Out[30]: array([1, 2], dtype=int32)
applied to the 2d it selects 2 rows:
In [31]: x2D[_29]
Out[31]:
array([[1, 2, 3],
[1, 2, 3]], dtype=int32)
It can be used again to select columns - but note the : place holder for the row index:
In [32]: _[:, _29]
Out[32]:
array([[1, 2],
[1, 2]], dtype=int32)
If we generate an indexing array from that mask, we can do the indexing with one step:
In [37]: idx = np.nonzero(x1D<3)
In [38]: idx
Out[38]: (array([0, 1]),)
In [39]: x2D[idx[0][:,None], idx[0]]
Out[39]:
array([[1, 2],
[1, 2]], dtype=int32)
An alternate way of writing this '2d' indexing:
In [41]: x2D[ [[0],[1]], [[0,1]] ]
Out[41]:
array([[1, 2],
[1, 2]], dtype=int32)
ix_ is a convenient tool for tweaking the indexing dimensions:
In [42]: x2D[np.ix_(idx[0], idx[0])]
Out[42]:
array([[1, 2],
[1, 2]], dtype=int32)
Or passing the boolean mask to ix_:
In [44]: np.ix_(_29, _29)
Out[44]:
(array([[0],
[1]]), array([[0, 1]]))
In [45]: x2D[np.ix_(_29, _29)]
Out[45]:
array([[1, 2],
[1, 2]], dtype=int32)
Writing In[32] so it's close to to your try:
In [46]: x2D[x1D<3][:, x1D<3]
Out[46]:
array([[1, 2],
[1, 2]], dtype=int32)

Does python have a predicate to test that row of a matrix are sorted?

I want to check that a sequence of N numpy vectors of integers is lexicographically ordered. All the vectors in the sequence have shape 1 × 2. (The value of N is big, so I want to avoid sorting this sequence if it is already sorted.)
Does Python, or numpy, already offer a predicate to perform such a test?
(It would not be hard to roll my own, but I prefer to use built-in tools if they exist.)
You can use np.diff and np.any:
A = np.array([[1,2,3], [2,3,1], [3, 4, 5]])
diff = np.diff(A, axis=0)
print np.all(diff>=0, axis=0)
To have an issorted predicate you need a well defined sort, or at least a clear method of comparing items.
To follow on my question about the nature of your data. It sounds as though you have something like this:
In [130]: x=[[1,3],[3,4],[1,2],[3,1],[0,2],[6,5]]
In [131]: x1=[np.array(i).reshape(1,2) for i in x]
In [132]: x1
Out[132]:
[array([[1, 3]]),
array([[3, 4]]),
array([[1, 2]]),
array([[3, 1]]),
array([[0, 2]]),
array([[6, 5]])]
The Python sort is lexographic - that is, sort of the 1st element on the sublists, and then on the 2nd element.
In [137]: sorted(x)
Out[137]: [[0, 2], [1, 2], [1, 3], [3, 1], [3, 4], [6, 5]]
numpy sorts don't preserve the pairs - depending on the axis specification it sorts by column, or row (or flat). But the np.sort doc does say that complex numbers are sorted lexographically:
In [157]: xj = np.dot(x,[1,1j])
In [158]: xj
Out[158]: array([ 1.+3.j, 3.+4.j, 1.+2.j, 3.+1.j, 0.+2.j, 6.+5.j])
In [159]: np.sort(xj)
Out[159]: array([ 0.+2.j, 1.+2.j, 1.+3.j, 3.+1.j, 3.+4.j, 6.+5.j])
This matches the Python list sort.
If my guess as to your data type is correct, a comparison based test would use something like:
In [167]: [i.__lt__(j) for i,j in zip(x[:-1],x[1:])]
Out[167]: [True, False, True, False, True]
In [168]: xs=sorted(x)
In [169]: [i.__lt__(j) for i,j in zip(xs[:-1],xs[1:])]
Out[169]: [True, True, True, True, True]
That also works for the complex array:
In [173]: xjs=np.sort(xj)
In [174]: [i.__lt__(j) for i,j in zip(xjs[:-1],xjs[1:])]
Out[174]: [True, True, True, True, True]
For large lists I'd try one of the itertools for short circuiting iteration.
But when applied to the (plain) array, it is clear that the question of whether it is sorted or not needs further specification.
In [172]: [i.__lt__(j) for i,j in zip(x1[:-1],x1[1:])]
Out[172]:
[array([[ True, True]], dtype=bool),
array([[False, False]], dtype=bool),
array([[ True, False]], dtype=bool),
array([[False, True]], dtype=bool),
array([[ True, True]], dtype=bool)]
By the way, a list of (2,1) arrays would look something like this:
[np.array(i).reshape(1,2) for i in x]
[array([[1, 3]]),
array([[3, 4]]),
array([[1, 2]]),
array([[3, 1]]),
array([[0, 2]]),
array([[6, 5]])]
which if turned into an array would have a (6,1,2) shape. Or did you want a (6,2) array?
In [179]: np.array(x)
Out[179]:
array([[1, 3],
[3, 4],
[1, 2],
[3, 1],
[0, 2],
[6, 5]])
numpy has lexsort, but this does a sort, not a test of whether the data is sorted. None-the-less, running it on sorted data is about twice as fast as unsorted.
import numpy as np
import timeit
def data(N):
return np.random.randint(0,10,(N,2))
def get_sorted(x):
return x[np.lexsort(x.T)]
x = data(5)
y = get_sorted(x)
print x # to verify lex sorting
print
print y
print
x = data(1000)
y = get_sorted(x)
# to test the time for sorted vs unsorted data
print timeit.timeit("np.lexsort(x.T)", "from __main__ import np, x", number=1000)
print timeit.timeit("np.lexsort(y.T)", "from __main__ import np, y", number=1000)
And here are the results:
[[6 7] # unsorted
[4 3]
[6 7]
[9 2]
[7 3]]
[[9 2] # sorted by the second column first
[4 3]
[7 3]
[6 7]
[6 7]]
0.0788 # time to lex sort 1000x2 unsorted data values
0.0381 # time to lex sort 1000x2 pre-sorted data values
Note also that the speed of python vs numpy will depend on the list, because python can sometime short-circuit its tests. So if you think that your list will generally be unsorted, a pure python solution could figure this out in the first few values, which could be much faster; whereas numpy solutions will generally work the the entire array.

Python: Differentiating between row and column vectors

Is there a good way of differentiating between row and column vectors in numpy? If I was to give one a vector, say:
from numpy import *
v = array([1,2,3])
they wouldn't be able to say weather I mean a row or a column vector. Moreover:
>>> array([1,2,3]) == array([1,2,3]).transpose()
array([ True, True, True])
Which compares the vectors element-wise.
I realize that most of the functions on vectors from the mentioned modules don't need the differentiation. For example outer(a,b) or a.dot(b) but I'd like to differentiate for my own convenience.
You can make the distinction explicit by adding another dimension to the array.
>>> a = np.array([1, 2, 3])
>>> a
array([1, 2, 3])
>>> a.transpose()
array([1, 2, 3])
>>> a.dot(a.transpose())
14
Now force it to be a column vector:
>>> a.shape = (3,1)
>>> a
array([[1],
[2],
[3]])
>>> a.transpose()
array([[1, 2, 3]])
>>> a.dot(a.transpose())
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Another option is to use np.newaxis when you want to make the distinction:
>>> a = np.array([1, 2, 3])
>>> a
array([1, 2, 3])
>>> a[:, np.newaxis]
array([[1],
[2],
[3]])
>>> a[np.newaxis, :]
array([[1, 2, 3]])
Use double [] when writing your vectors.
Then, if you want a row vector:
row_vector = array([[1, 2, 3]]) # shape (1, 3)
Or if you want a column vector:
col_vector = array([[1, 2, 3]]).T # shape (3, 1)
The vector you are creating is neither row nor column. It actually has 1 dimension only. You can verify that by
checking the number of dimensions myvector.ndim which is 1
checking the myvector.shape, which is (3,) (a tuple with one element only). For a row vector is should be (1, 3), and for a column (3, 1)
Two ways to handle this
create an actual row or column vector
reshape your current one
You can explicitly create a row or column
row = np.array([ # one row with 3 elements
[1, 2, 3]
]
column = np.array([ # 3 rows, with 1 element each
[1],
[2],
[3]
])
or, with a shortcut
row = np.r_['r', [1,2,3]] # shape: (1, 3)
column = np.r_['c', [1,2,3]] # shape: (3,1)
Alternatively, you can reshape it to (1, n) for row, or (n, 1) for column
row = my_vector.reshape(1, -1)
column = my_vector.reshape(-1, 1)
where the -1 automatically finds the value of n.
I think you can use ndmin option of numpy.array. Keeping it to 2 says that it will be a (4,1) and transpose will be (1,4).
>>> a = np.array([12, 3, 4, 5], ndmin=2)
>>> print a.shape
>>> (1,4)
>>> print a.T.shape
>>> (4,1)
If you want a distiction for this case I would recommend to use a matrix instead, where:
matrix([1,2,3]) == matrix([1,2,3]).transpose()
gives:
matrix([[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
You can also use a ndarray explicitly adding a second dimension:
array([1,2,3])[None,:]
#array([[1, 2, 3]])
and:
array([1,2,3])[:,None]
#array([[1],
# [2],
# [3]])
You can store the array's elements in a row or column as follows:
>>> a = np.array([1, 2, 3])[:, None] # stores in rows
>>> a
array([[1],
[2],
[3]])
>>> b = np.array([1, 2, 3])[None, :] # stores in columns
>>> b
array([[1, 2, 3]])
If I want a 1x3 array, or 3x1 array:
import numpy as np
row_arr = np.array([1,2,3]).reshape((1,3))
col_arr = np.array([1,2,3]).reshape((3,1)))
Check your work:
row_arr.shape #returns (1,3)
col_arr.shape #returns (3,1)
I found a lot of answers here are helpful, but much too complicated for me. In practice I come back to shape and reshape and the code is readable: very simple and explicit.
When I tried to compute w^T * x using numpy, it was super confusing for me as well. In fact, I couldn't implement it myself. So, this is one of the few gotchas in NumPy that we need to acquaint ourselves with.
As far as 1D array is concerned, there is no distinction between a row vector and column vector. They are exactly the same.
Look at the following examples, where we get the same result in all cases, which is not true in (the theoretical sense of) linear algebra:
In [37]: w
Out[37]: array([0, 1, 2, 3, 4])
In [38]: x
Out[38]: array([1, 2, 3, 4, 5])
In [39]: np.dot(w, x)
Out[39]: 40
In [40]: np.dot(w.transpose(), x)
Out[40]: 40
In [41]: np.dot(w.transpose(), x.transpose())
Out[41]: 40
In [42]: np.dot(w, x.transpose())
Out[42]: 40
With that information, now let's try to compute the squared length of the vector |w|^2.
For this, we need to transform w to 2D array.
In [51]: wt = w[:, np.newaxis]
In [52]: wt
Out[52]:
array([[0],
[1],
[2],
[3],
[4]])
Now, let's compute the squared length (or squared magnitude) of the vector w :
In [53]: np.dot(w, wt)
Out[53]: array([30])
Note that we used w, wt instead of wt, w (like in theoretical linear algebra) because of shape mismatch with the use of np.dot(wt, w). So, we have the squared length of the vector as [30]. Maybe this is one of the ways to distinguish (numpy's interpretation of) row and column vector?
And finally, did I mention that I figured out the way to implement w^T * x ? Yes, I did :
In [58]: wt
Out[58]:
array([[0],
[1],
[2],
[3],
[4]])
In [59]: x
Out[59]: array([1, 2, 3, 4, 5])
In [60]: np.dot(x, wt)
Out[60]: array([40])
So, in NumPy, the order of the operands is reversed, as evidenced above, contrary to what we studied in theoretical linear algebra.
P.S. : potential gotchas in numpy
It looks like Python's Numpy doesn't distinguish it unless you use it in context:
"You can have standard vectors or row/column vectors if you like. "
" :) You can treat rank-1 arrays as either row or column vectors. dot(A,v) treats v as a column vector, while dot(v,A) treats v as a row vector. This can save you having to type a lot of transposes. "
Also, specific to your code: "Transpose on a rank-1 array does nothing. "
Source:
Link
Here's another intuitive way. Suppose we have:
>>> a = np.array([1, 3, 4])
>>> a
array([1, 3, 4])
First we make a 2D array with that as the only row:
>>> a = np.array([a])
>>> a
array([[1, 3, 4]])
Then we can transpose it:
>>> a.T
array([[1],
[3],
[4]])
row vectors are (1,0) tensor, vectors are (0, 1) tensor. if using v = np.array([[1,2,3]]), v become (0,2) tensor. Sorry, i am confused.
The excellent Pandas library adds features to numpy that make these kinds of operations more intuitive IMO. For example:
import numpy as np
import pandas as pd
# column
df = pd.DataFrame([1,2,3])
# row
df2 = pd.DataFrame([[1,2,3]])
You can even define a DataFrame and make a spreadsheet-like pivot table.

Categories

Resources