Subset numpy array based on another array - python

I have 2 numpy arrays:
arr_a = array(['1m_nd', '2m_nd', '1m_4wk'],
dtype='<U15')
arr_b = array([0, 1, 1])
I want to select elements from arr_a based on arr_b. I am doing this:
arr_a[arr_b], but I get this as result:
array(['1m_nd', '2m_nd', '2m_nd'],
dtype='<U15')
instead of:
array(['2m_nd', '1m_4wk'],
dtype='<U15')
How do i fix this?

You need to pass it a boolean array, for example:
>>> arr_a[arr_b>0]
array(['2m_nd', '1m_4wk'],
dtype='<U15')

Given arr_a and arr_b, Running the following will give the boolean array for each of the elements in arr_b whose value is 1 => True and 0 => False . Correspondingly the boolean values are checked with the index value in arr_a. Here is the line of code you'd need.
>>> arr_a[arr_b == 1]
array([u'2m_nd', u'1m_4wk'],
dtype='<U15')

Related

How to extract numpy array stored in tuple?

Let's consider very easy example:
import numpy as np
a = np.array([0, 1, 2])
print(np.where(a < -1))
(array([], dtype=int64),)
print(np.where(a < 2))
(array([0, 1]),)
I'm wondering if its possible to extract length of those arrays, i.e. I want to know that the first array is empty, and the second is not. Usually it can be easily done with len function, however now numpy array is stored in tuple. Do you know how it can be done?
Just use this:
import numpy as np
a = np.array([0, 1, 2])
x = np.where(a < 2)[0]
print(len(x))
Outputs 2
To find the number of values in the array satisfying the predicate, you can skip np.where and use np.count_nonzero instead:
a = np.array([0, 1, 2])
print(np.count_nonzero(a < -1))
>>> 0
print(np.count_nonzero(a < 2))
>>> 2
If you need to know whether there are any values in a that satisfy the predicate, but not how many there are, a cleaner way of doing so is with np.any:
a = np.array([0, 1, 2])
print(np.any(a < -1))
>>> False
print(np.any(a < 2))
>>> True
np.where takes 3 arguments: condition, x, y where last two are arrays and are optional. When provided the funciton returns element from x for indices where condition is True, and y otherwise. When only condition is provided it acts like np.asarray(condition).nonzero() and returns a tuple, as in your case. For more details see Note at np.where.
Alternatively, because you need only length of sublist where condition is True, you can simply use np.sum(condition):
a = np.array([0, 1, 2])
print(np.sum(a < -1))
>>> 0
print(np.sum(a < 2))
>>> 2

How to check numpy array is empty? [duplicate]

How can I check whether a numpy array is empty or not?
I used the following code, but this fails if the array contains a zero.
if not self.Definition.all():
Is this the solution?
if self.Definition == array([]):
You can always take a look at the .size attribute. It is defined as an integer, and is zero (0) when there are no elements in the array:
import numpy as np
a = np.array([])
if a.size == 0:
# Do something when `a` is empty
https://numpy.org/devdocs/user/quickstart.html (2020.04.08)
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.
(...) NumPy’s array class is called ndarray. (...) The more important attributes of an ndarray object are:
ndarray.ndim
the number of axes (dimensions) of the array.
ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.
One caveat, though.
Note that np.array(None).size returns 1!
This is because a.size is equivalent to np.prod(a.shape),
np.array(None).shape is (), and an empty product is 1.
>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0
Therefore, I use the following to test if a numpy array has elements:
>>> def elements(array):
... return array.ndim and array.size
>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24
Why would we want to check if an array is empty? Arrays don't grow or shrink in the same that lists do. Starting with a 'empty' array, and growing with np.append is a frequent novice error.
Using a list in if alist: hinges on its boolean value:
In [102]: bool([])
Out[102]: False
In [103]: bool([1])
Out[103]: True
But trying to do the same with an array produces (in version 1.18):
In [104]: bool(np.array([]))
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value
of an empty array is ambiguous. Returning False, but in
future this will result in an error. Use `array.size > 0` to
check that an array is not empty.
#!/usr/bin/python3
Out[104]: False
In [105]: bool(np.array([1]))
Out[105]: True
and bool(np.array([1,2]) produces the infamous ambiguity error.
edit
The accepted answer suggests size:
In [11]: x = np.array([])
In [12]: x.size
Out[12]: 0
But I (and most others) check the shape more than the size:
In [13]: x.shape
Out[13]: (0,)
Another thing in its favor is that it 'maps' on to an empty list:
In [14]: x.tolist()
Out[14]: []
But there are other other arrays with 0 size, that aren't 'empty' in that last sense:
In [15]: x = np.array([[]])
In [16]: x.size
Out[16]: 0
In [17]: x.shape
Out[17]: (1, 0)
In [18]: x.tolist()
Out[18]: [[]]
In [19]: bool(x.tolist())
Out[19]: True
np.array([[],[]]) is also size 0, but shape (2,0) and len 2.
While the concept of an empty list is well defined, an empty array is not well defined. One empty list is equal to another. The same can't be said for a size 0 array.
The answer really depends on
what do you mean by 'empty'?
what are you really test for?

Python numpy array sum over certain indices

How to perform a sum just for a list of indices over numpy array, e.g., if I have an array a = [1,2,3,4] and a list of indices to sum, indices = [0, 2] and I want a fast operation to give me the answer 4 because the value for summing value at index 0 and index 2 in a is 4
You can use sum directly after indexing with indices:
a = np.array([1,2,3,4])
indices = [0, 2]
a[indices].sum()
The accepted a[indices].sum() approach copies data and creates a new array, which might cause problem if the array is large. np.sum actually has an argument to mask out colums, you can just do
np.sum(a, where=[True, False, True, False])
Which doesn't copy any data.
The mask array can be obtained by:
mask = np.full(4, False)
mask[np.array([0,2])] = True
Try:
>>> a = [1,2,3,4]
>>> indices = [0, 2]
>>> sum(a[i] for i in indices)
4
Faster
If you have a lot of numbers and you want high speed, then you need to use numpy:
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[indices]
array([1, 3])
>>> np.sum(a[indices])
4

How to quickly grab specific indices from a numpy array?

But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.
You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])
You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.

numpy, mapping one array to parts of another

lets say i have one array
a = numpy.arange(8*6*3).reshape((8, 6, 3))
#and another:
l = numpy.array([[0,0],[0,1],[1,1]]) #an array of indexes to array "a"
#and yet another:
b = numpy.array([[0,0,5],[0,1,0],[1,1,3]])
where "l" and "b" are of equal length,
and i want to say
a[l] = b
such that a[0][0] becomes [0,0,5], a[0][1] becomes [0,1,0] etc.
it seems to work fine when ive got one-dimensional arrays, but it gives me the error
ValueError: array is not broadcastable to correct shape
when i try it with a 3-dimensional array.
import numpy as np
a = np.arange(8*6*3).reshape((8, 6, 3))
l = np.array([[0,0],[0,1],[1,1]]) #an array of indexes to array "a"
b = np.array([[0,0,5],[0,1,0],[1,1,3]])
a[tuple(l.T)] = b
print(a[0,0])
# [0 0 5]
print(a[0,1])
# [0 1 0]
print(a[1,1])
# [1 1 3]
Anne Archibald says,
When you are supplying arrays in all index slots, what you get back
has the same shape as the arrays you put in; so if you supply
one-dimensional lists, like
A[[1,2,3],[1,4,5],[7,6,2]]
what you get is
[A[1,1,7], A[2,4,6], A[3,5,2]]
When you compare that with your example, you see that
a[l] = b tells NumPy to set
a[0,0,1] = [0,0,5]
a[0,1,1] = [0,1,0]
and leaves the third element of b unassigned. This is why you get the error
ValueError: array is not broadcastable to correct shape
The solution is to transpose the array l into the correct shape:
In [50]: tuple(l.T)
Out[50]: (array([0, 0, 1]), array([0, 1, 1]))
(You could also use zip(*l), but tuple(l.T) is a bit quicker.)
Or with your same arrays you can use
for i in range(len(l)):
a[l[i][0]][l[i][1]]=b[i]

Categories

Resources