Python indexing numpy array using a smaller boolean array - python

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled
with the elements of x corresponding to the True values of obj. The
search order will be row-major, C-style. If obj has True values at
entries that are outside of the bounds of x, then an index error will
be raised. If obj is smaller than x it is identical to filling it with
False.
I read from the numpy reference that I can index a larger array using a smaller boolean array ,and the rest entries would be automatically filled with False.
Example :
From an array, select all rows which sum up to less or equal two:
>>> x = np.array([[0, 1], [1, 1], [2, 2]])
>>> rowsum = x.sum(-1)
>>> x[rowsum <= 2, :]
array([[0, 1],[1, 1]])
But if rowsum would have two dimensions as well:
>>> rowsum = x.sum(-1, keepdims=True)
>>> rowsum.shape
(3, 1)
>>> x[rowsum <= 2, :] # fails
IndexError: too many indices
>>> x[rowsum <= 2]
array([0, 1])
The last one giving only the first elements because of the extra
dimension.
But the example simply doesn't work ,it says "IndexError: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1"
How to make it work ?I'm using python 3.6.3 and numpy 1.13.3.

From Numpy 11, It's not compatible with the new default behaviour : (boolean-indexing-changes) :
Boolean indexing changes.
...
...
Boolean indexes must match the dimension of the axis that
they index.
...
Internals have been optimized, the docs not yet ....

I think what you are looking for is NumPy broadcasting.
import numpy as np
x = np.array([[0, 1], [1, 1], [2, 2]])
rowsum = x.sum(axis=1)
x[rowsum <= 2]
Gives:
array([[0, 1],
[1, 1]])
The problem is that you used keepdims=True, which means the sum creates a column vector, rather than a rank one array which can be broadcasted.

Related

Numpy.unique on 3d array with axis=2 but not working as expected

Consider the following code, when axis=2, it should remove the duplicate of [1 1] to [1], but not. I wonder why it doesn't do unique operation on the 3rd axis.
arr = np.array([[[1,1], [1,1], [1,1]],
[[7,1], [10,1], [10,1]],
[[1,1], [1,1], [1,1]]])
print(np.unique(arr, axis=0))
print("----------------")
print(np.unique(arr, axis=1))
print("----------------")
print(np.unique(arr, axis=2))
I tried with many other examples, and it still not working on the 3rd axis.
Note this from the documentation (citing help(np.unique)):
The axis to operate on. If None, ar will be flattened. If an integer, the subarrays indexed by the given axis will be flattened and treated as the elements of a 1-D array with the dimension of the given axis […]
When an axis is specified the subarrays indexed by the axis are sorted. […] The result is that the flattened subarrays are sorted in lexicographic order starting with the first element.
So in your case it will try to sort and compare the sub-arrays arr[:, :, 0].flatten() which is [ 1, 1, 1, 7, 10, 10, 1, 1, 1] with arr[:, :, 1].flatten() which is [1, 1, 1, 1, 1, 1, 1, 1, 1].
These are obviously not the same so no change is made except that the second is sorted before the first in a lexicographical comparison.
I assume what you wanted it to do is getting rid of the duplicate [1, 1] entries. However, np.unique cannot really work that way because these are arrays not lists. That behavior would result in different number of entries in arr[0] compared to arr[1] and that obviously cannot work.

Dot product with numpy gives array with size (n, )

I am trying to get the dotproduct of two arrays in python using the numpy package. I get as output an array of size (n,). It says that my array has no column while I do see the results when I print it. Why does my array have no column and how do I fix this?
My goal is to calculate y - np.dot(x,b). The issue is that y is (124, 1) while np.dot(x,b) is (124,)
Thanks
It seems that you are trying to subtract two arrays of a different shape. Fortunately, it is off by a single additional axis, so there are two ways of handling it.
(1) You slice the y array to match the shape of the dot(x,b) array:
y = y[:,0]
print(y-np.dot(x,b))
(2) You add an additional axis on the np.dot(x,b) array:
dot = np.dot(x,b)
dot = dot[:,None]
print(y-dot)
Hope this helps
it may depends on the dimension of your array
For example :
a = [1, 0]
b = [[4, 1], [2, 2]]
c = np.dot(a,b)
gives
array([4, 1])
and its shape is (2,)
but if you change a like :
a = [[1, 0],[1,1]]
then result is :
array([[4, 1],
[6, 3]])
and its shape is (2,2)

Numpy array slice using tuple

I've read the numpy doc on slicing(especially the bottom where it discusses variable array indexing)
https://docs.scipy.org/doc/numpy/user/basics.indexing.html
But I'm still not sure how I could do the following: Write a method that either returns a 3D set of indices, or a 4D set of indices that are then used to access an array. I want to write a method for a base class, but the classes that derive from it access either 3D or 4D depending on which derived class is instantiated.
Example Code to illustrate idea:
import numpy as np
a = np.ones([2,2,2,2])
size = np.shape(a)
print(size)
for i in range(size[0]):
for j in range(size[1]):
for k in range(size[2]):
for p in range(size[3]):
a[i,j,k,p] = i*size[1]*size[2]*size[3] + j*size[2]*size[3] + k*size[3] + p
print(a)
print('compare')
indices = (0,:,0,0)
print(a[0,:,0,0])
print(a[indices])
In short, I'm trying to get a tuple(or something) that can be used to make both of the following access depending on how I fill the tuple:
a[i, 0, :, 1]
a[i, :, 1]
The slice method looked promising, but it seems to require a range, and I just want a ":" i.e. the whole dimension. What options are out there for variable numpy array dimension access?
In [324]: a = np.arange(8).reshape(2,2,2)
In [325]: a
Out[325]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
slicing:
In [326]: a[0,:,0]
Out[326]: array([0, 2])
In [327]: idx = (0,slice(None),0) # interpreter converts : into slice object
In [328]: a[idx]
Out[328]: array([0, 2])
In [331]: idx
Out[331]: (0, slice(None, None, None), 0)
In [332]: np.s_[0,:,0] # indexing trick to generate same
Out[332]: (0, slice(None, None, None), 0)
Your code appears to work how you want it using :. The reason the two examples
(a[i, 0, :, 7], a[i, :, 7])
don't work is because the 7 is out of range of the array. If you change the 7 to something in range like 1 then it returns a value, which I believe is what you are looking for.

sort numpy 2d array by indice of column

I am using numpy in python. I have a 1D(nx1) array and a 2D(nxm) array. I used argsort to get a indice of the 1D array. Now I want to use that indice to sort my 2D(nxm) array's colum.
I want to know how to do it?
For example:
>>>array1d = np.array([1, 3, 0])
>>>array2d = np.array([[1,2,3],[4,5,6]])
>>>array1d_indice = np.argsort(array1d)
array([2, 0, 1], dtype=int64)
I want use array1d_indice to sord array2d colum to get:
[[3, 1, 2],
[6, 4, 5]]
Or anyway easier to achieve this is welcome
If what you mean is that you want the columns sorted based on the vector, then you use argsort on the vector:
vi = np.argsort(vector)
then to arrange the columns of array in the right order,
sorted = array[:, tuple(vi)]
to get rows, switch around the order of : and tuple(vi)

Selecting a column of a numpy array

I am somewhat confused about selecting a column of an NumPy array, because the result is different from Matlab and even from NumPy matrix. Please see the following cases.
In Matlab, we use the following command to select a column vector out of a matrix.
x = [0, 1; 2 3]
out = x(:, 1)
Then out becomes [0; 2], which is a column vector.
To do the same thing with a NumPy Matrix
x = np.matrix([[0, 1], [2, 3]])
out = x[:, 0]
Then the output is np.matrix([[0], [2]]) which is expected, and it is a column vector.
However, in case of NumPy array
x = np.array([[0, 1], [2, 3]])
out = x[:, 0]
The output is np.array([0, 2]) which is 1 dimensional, so it is not a column vector. My expectation is it should have been np.array([[0], [2]]).
I have two questions.
1. Why is the output from the NumPy array case different form the NumPy matrix case? This is causing a lot of confusion to me, but I think there might be some reason for this.
2. To get a column vector from a 2-Dim NumPy Array, then should I do additional things like expand_dims
x = np.array([[0, 1], [2, 3]])
out = np.expand_dims(x[:, 0], axis = 1)
In MATLAB everything has atleast 2 dimensions. In older MATLABs, 2d was it, now they can have more. np.matrix is modeled on that old MATLAB.
What does MATLAB do when you index a 3d matrix?
np.array is more general. It can have 0, 1, 2 or more dimensions.
x[:, 0]
x[0, :]
both select one column or row, and return an array with one less dimension.
x[:, [0]]
x[[0], :]
would return 2d arrays, with a singleton dimension.
In Octave (MATLAB clone) indexing produces inconsistent results, depending on which side of matrix I select:
octave:7> x=ones(2,3,4);
octave:8> size(x)
ans =
2 3 4
octave:9> size(x(1,:,:))
ans =
1 3 4
octave:10> size(x(:,:,1))
ans =
2 3
MATLAB/Octave adds dimensions at the end, and apparently readily squeezes them down on that side as well.
numpy orders the dimensions in the other direction, and can add dimensions at the start as needed. But it is consistent in squeezing out singleton dimensions when indexing.
The fact that numpy can have any number of dimensions, while MATLAB has a minimum of 2 is a crucial difference that often trips up MATLAB users. But one isn't any more logical than the other. MATLAB's practice is more a more matter of history than general principals.

Categories

Resources