Optimizing logical operations on nested numpy arrays

Optimizing logical operations on nested numpy arrays - python

I start with a numpy array of numpy arrays, where each of the inner numpy arrays can have different lengths. An example is given below:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5])
c = np.array([a, b])
print c
[[1 2 3] [4 5]]
I'd like to be able to perform a boolean operation on every element in every element in array c, but when I try to I get the following value error:
print c > 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
I'd like to be able to get the result:
[[True True True] [True True]]
Without using a for loop or iterating on the outer array. Is this possible, and if so, how can I accomplish it?

I can think of two broad approaches, either pad your arrays so that you can a single 2d array instead of nested arrays, or treat your nested arrays as a list of arrays. The first would look something like this:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5, -99])
c = np.array([a, b])
print c.shape
# (2, 3)
print c > 0
# [[ True True True]
# [ True True False]]
Or do something like:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5])
c = np.array([a, b])
out = [i > 0 for i in c]
print out
# [array([ True, True, True], dtype=bool), array([ True, True], dtype=bool)]
If padding is not an option, you might in fact find that lists of arrays are better behaved than arrays of arrays.

Related

finding elements in an array containing numpy array [duplicate]

I have a list of numpy arrays, say,
a = [np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)]
and I have a test array, say
b = np.random.rand(3, 3)
I want to check whether a contains b or not. However
b in a
throws the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
What is the proper way for what I want?

You can just make one array of shape (3, 3, 3) out of a:
a = np.asarray(a)
And then compare it with b (we're comparing floats here, so we should use isclose())
np.all(np.isclose(a, b), axis=(1, 2))
For example:
a = [np.random.rand(3,3),np.random.rand(3,3),np.random.rand(3,3)]
a = np.asarray(a)
b = a[1, ...] # set b to some value we know will yield True
np.all(np.isclose(a, b), axis=(1, 2))
# array([False, True, False])

As highlighted by #jotasi the truth value is ambiguous due to element-wise comparison within the array.
There was a previous answer to this question here. Overall your task can be done in various ways:
list-to-array:
You can use the "in" operator by converting the list to a (3,3,3)-shaped array as follows:
>>> a = [np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)]
>>> a= np.asarray(a)
>>> b= a[1].copy()
>>> b in a
True
np.all:
>>> any(np.all((b==a),axis=(1,2)))
True
list-comperhension:
This done by iterating over each array:
>>> any([(b == a_s).all() for a_s in a])
True
Below is a speed comparison of the three approaches above:
Speed Comparison
import numpy as np
import perfplot
perfplot.show(
setup=lambda n: np.asarray([np.random.rand(3*3).reshape(3,3) for i in range(n)]),
kernels=[
lambda a: a[-1] in a,
lambda a: any(np.all((a[-1]==a),axis=(1,2))),
lambda a: any([(a[-1] == a_s).all() for a_s in a])
],
labels=[
'in', 'np.all', 'list_comperhension'
],
n_range=[2**k for k in range(1,20)],
xlabel='Array size',
logx=True,
logy=True,
)

Ok so in doesn't work because it's effectively doing
def in_(obj, iterable):
for elem in iterable:
if obj == elem:
return True
return False
Now, the problem is that for two ndarrays a and b, a == b is an array (try it), not a boolean, so if a == b fails. The solution is do define a new function
def array_in(arr, list_of_arr):
for elem in list_of_arr:
if (arr == elem).all():
return True
return False
a = [np.arange(5)] * 3
b = np.ones(5)
array_in(b, a) # --> False

This error is because if a and b are numpy arrays then a == b doesn't return True or False, but array of boolean values after comparing a and b element-wise.
You can try something like this:
np.any([np.all(a_s == b) for a_s in a])
[np.all(a_s == b) for a_s in a] Here you are creating list of boolean values, iterating through elements of a and checking if all elements in b and particular element of a are the same.
With np.any you can check if any element in your array is True

As pointed out in this answer, the documentation states that:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
a[0]==b is an array, though, containing an element-wise comparison of a[0] and b. The overall truth value of this array is obviously ambiguous. Are they the same if all elements match, or if most match of if at least one matches? Therefore, numpy forces you to be explicit in what you mean. What you want to know, is to test whether all elements are the same. You can do that by using numpy's all method:
any((b is e) or (b == e).all() for e in a)
or put in a function:
def numpy_in(arrayToTest, listOfArrays):
return any((arrayToTest is e) or (arrayToTest == e).all()
for e in listOfArrays)

Use array_equal from numpy
import numpy as np
a = [np.random.rand(3,3),np.random.rand(3,3),np.random.rand(3,3)]
b = np.random.rand(3,3)
for i in a:
if np.array_equal(b,i):
print("yes")

numpy multiple boolean index arrays

I have an array which I want to use boolean indexing on, with multiple index arrays, each producing a different array. Example:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
Should return something along the lines of:
[[2,3], [1]]
I assume that since the number of cells containing True can vary between masks, I cannot expect the result to reside in a 2d numpy array, but I'm still hoping for something more elegant than iterating over the masks the appending the result of indexing w by the i-th b mask to it.
Am I missing a better option?
Edit: The next step I want to do afterwards is to sum each of the arrays returned by w[b], returning a list of scalars. If that somehow makes the problem easier, I'd love to know as well.

Assuming you want a list of numpy arrays you can simply use a comprehension:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
[w[bool] for bool in b]
# [array([2, 3]), array([1])]
If your goal is just a sum of the masked values you use:
np.sum(w*b) # 6
or
np.sum(w*b, axis=1) # array([5, 1])
# or b # w
…since False times you number will be 0 and therefor won't effect the sum.

Try this:
[w[x] for x in b]
Hope this helps.

While loop for two conditions to be checked simultaneously in array form?

I am writing a while loop for the following statement in python:
I have a vector valued function which returns a 2x1 array with x and y values
I want to write a code that ensures the loop is only run when the [x,y] given by the function are less than [x,y]
I tried to use a.all() however getting an attribute error
Is there another way to check two conditions simultaneously ?

I assumed that the array in your code is np.array.
Let's define the data a, b and c.
a = np.array([[3], [2]]) #[[3], [2]]
b = np.array([[2], [1]])
c = np.array([[4], [1]])
If we do the following comparison, we will obtain
In [1]: a > b
Out[1]:
array([[ True],
[ True]])
In [2]: a > c
Out[2]:
array([[False],
[ True]])
Since you want to ensure both conditions to be true at the same time, you can use the python built-in all(), which returns True only if all of the parameters are true.
In [3]: all(a > c)
Out[3]: False
In [4]: all(a > b)
Out[4]: True

You can do it either like this:
while all(abs(val) > err for val, err in zip(f(x, y), a):
...
Or like this:
while abs(f(x, y)[0]) > a[0] and abs(f(x, y)[1]) > a[1]:
...

Python numpy array sum over certain indices

How to perform a sum just for a list of indices over numpy array, e.g., if I have an array a = [1,2,3,4] and a list of indices to sum, indices = [0, 2] and I want a fast operation to give me the answer 4 because the value for summing value at index 0 and index 2 in a is 4

You can use sum directly after indexing with indices:
a = np.array([1,2,3,4])
indices = [0, 2]
a[indices].sum()

The accepted a[indices].sum() approach copies data and creates a new array, which might cause problem if the array is large. np.sum actually has an argument to mask out colums, you can just do
np.sum(a, where=[True, False, True, False])
Which doesn't copy any data.
The mask array can be obtained by:
mask = np.full(4, False)
mask[np.array([0,2])] = True

Try:
>>> a = [1,2,3,4]
>>> indices = [0, 2]
>>> sum(a[i] for i in indices)
4
Faster
If you have a lot of numbers and you want high speed, then you need to use numpy:
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[indices]
array([1, 3])
>>> np.sum(a[indices])
4

How to quickly grab specific indices from a numpy array?

But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.

You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])

You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimizing logical operations on nested numpy arrays - python

Related

finding elements in an array containing numpy array [duplicate]

numpy multiple boolean index arrays

While loop for two conditions to be checked simultaneously in array form?

Python numpy array sum over certain indices

How to quickly grab specific indices from a numpy array?

Categories

Resources