numpy: sqrt in place: is this a bug? - python

I'm trying to do sqrt in place on a portion of an array, selected using a boolean mask.
Why doesn't this work:
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
np.sqrt(a[[True, False], :], out=a[[True, False], :])
print(a[[True, False], :]) # prints [[4, 9]], sqrt in place failed
print('')
b = np.zeros_like(a[[True, False], :])
np.sqrt(a[[True, False], :], out=b)
print(b) # prints [[2, 3]] sqrt in b succeeded
If I'm selecting a single index instead this works (but it doesn't help me since I want to do a sparse update):
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
np.sqrt(a[0, :], out=a[0, :])
print(a[0, :]) # prints [2, 3]
print('')
b = np.zeros_like(a[0, :])
np.abs(a[0, :], out=b) # prints [2, 3]
print(b)

This is explained in the indexing documentation, relevant part:
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
Indexing with a boolean array is considered "advanced", hence you always get a copy, and modifying it won't touch the original data. Indeed in your first example b is modified but a is not. Using indices only returns a "view", and that is why the original data is modified.

The question identifies that an in-place square root is possible on a simple slice. So given the sparse update, one could loop over the True elements of the (sparse) boolean mask doing in-place square-roots on such slices.
It is not as efficient as it hypothetically could be if the boolean mask indexing returned a view of the original array, but it may be better than nothing.
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
mask = np.array([True, False])
for (i,) in np.argwhere(mask):
slice = a[i]
np.sqrt(slice, out=slice)
print(a)
Gives:
[[ 2. 3.]
[ 16. 25.]]

The sqrt() does not work in place in general. It returns the modified array. So you have to replace the line np.sqrt(a[[True, False], :], out=a[[True, False], :]) with a = np.sqrt(a[[True, False], :], out=a[[True, False], :]) to get the result of the sqrt function in array a.

Related

Modify masked tensor or array in function

I am trying to modify a masked pytorch tensor inside a function.
I observe the same behaviour for numpy arrays.
from torch import tensor
def foo(x):
"""
Minimal example.
The actual function is complex.
"""
x *= -1
y = tensor([1,2,3])
mask = [False, True, False]
foo(y[mask])
print(y)
# Result: tensor([1, 2, 3]). Expected: tensor([1, -2, 3])
There are two obvious solutions that I can think of. Both have shortcomings I would like to avoid.
def foo1(x):
return -x
y = tensor([1,2,3])
mask = [False, True, False]
y[mask] = foo1(y[mask])
This creates an copy of y[mask], which is not ideal for my RAM-bound application.
def foo2(x, m):
x[m] *= -1
y = tensor([1,2,3])
mask = [False, True, False]
foo2(y, mask)
This works without a copy, but makes the function messy. It has to be aware of the mask and types. E.g. it won't work directly on scalars.
What is the idiomatic way to handle this problem?

Optimal way to modify value of a numpy array based on condition

I have a numpy.ndarray of the form
import numpy as np
my_array = np.array([[True, True, False], [True, False, True]])
In this example is a matrix of 3 columns and two rows, but my_array is thinking as an arbitriary 2d shape. By other hand I have a numpy.ndarray that represent a vector W with lenght equal to the number of rows of my_array, this vector has float values, for example W = np.array([10., 1.5]). Additionally I have a list WT of two-tuples with lenght equal to W, for example WT = [(0,20.), (0,1.)]. These tuples represents mathematical intervals (a,b).
I want to modify the column values of my_arraybased on the following condition: Given a column, we change to False (or we keep False if the value was that) the i-th element of the column if the i-th element of W does not belong to the mathematical interval of the i-th two-tuple of WT. For example, the first column of my_array is [True, True], so we have to analyze if 10. belong to (0,20) and 1.5 belong to (0,1), the resulting column should be [True, False].
I have a for loop, but I think there is a smart way to do this.
Obs: I donĀ“t need to change values from False to True.
I made this implementation :
import numpy as np
my_array = np.array([[True, True, False], [True, False, True]])
W = np.array([10.0, 1.5])
WT = np.array([[0, 20], [0, 1]])
i = (W > WT[:,0]) * (W < WT[:,1])
print("my_array before", my_array)
my_array[:, 0] = i
print("my_array after", my_array)
It will update the column values given your conditions.

How to properly index to an array of changing size due to masking in python

This is a problem I've run into when developing something, and it's a hard question to phrase. So it's best with an simple example:
Imagine you have 4 random number generators which generate an array of size 4:
[rng-0, rng-1, rng-2, rng-3]
| | | |
[val0, val1, val2, val3]
Our goal is to loop through "generations" of arrays populated by these RNGs, and iteratively mask out the RNG which outputted the maximum value.
So an example might be starting out with:
mask = [False, False, False, False], arr = [0, 10, 1, 3], and so we would mask out rng-1.
Then the next iteration could be: mask = [False, True, False, False], arr = [2, 1, 9] (before it gets asked, yes arr HAS to decrease in size with each rng that is masked out). In this case, it is clear that rng-3 should be masked out (e.g. mask[3] = True), but since arr is now of different size than mask, it is tricky to get the right indexing for setting the mask (since the max of arr is at index 2 of the arr, but the corresponding generator is index 3). This problem grows more an more difficult as more generators get masked out (in my case I'm dealing with a mask of size ~30).
If it helps, here is python version of the example:
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
for _ in range(mask.size):
arr = rng.randint(100, size=~mask.sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx = unadjusted_max_value_idx + ????
mask[adjusted_max_value_idx] = True
Any idea a good way to map the index of the max value in the arr to the corresponding index in the mask? (i.e. moving from unadjusted_max_value_idx to adjusted_max_value_idx)
#use a helper list
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
ndxLst=list(range(mask.size))
maskHistory=[]
for _ in range(mask.size):
arr = rng.randint(100, size=(~mask).sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx=ndxLst.pop(unadjusted_max_value_idx)
mask[adjusted_max_value_idx] = True
maskHistory.append(adjusted_max_value_idx)
print(maskHistory)
print(mask)

Matrix row-wise indexing

I have a numpy array, e.g., the following
import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
and also another numpy array with boolean values, e.g.,
I = np.array([[True, False, False], [False, True, False]])
I would like to get the matrix whose elements' indices are given by I. In the above example, I'd like to get the matrix
array([[1], [5]])
but if I try
B = A[I]
then I get
array([1, 5])
I understand that this is due to the fact that the number of Trues's may not be the same in each row. But what about if they are? Is there any way of doing this using numpy?
In fact, I'd like to use this in Theano, using the tensor module. I have a theano expressions for the above (two T.matrix theano variables) that contain the above arrays. Is there any convenient way of computing the new, smaller matrix?
If you can figure out how many items are returned from each row in advance, you can just reshape your output. I'd do it like this:
n = I.sum(1).max()
x = A[I].reshape(-1, n)
print(x)
array([[1],
[5]])

Numpy.ma: Keep masked array linked to its building blocks (shallow copy?)

If you are building a masked array, its :
class myclass(object):
def __init__(self, data, mask):
self.masked_array = numpy.ma(data, mask=mask)
What I would like is for mask and data to change when I change the masked array. Like:
data = [1,2,3]
mask = [True, False, False]
c = myclass(data, mask)
c.masked_array.mask[0] = False # this will not change mask
The obvious answer is to link the after building the object:
c = myclass(data, mask)
data = c.masked_array.data
mask = c.masker_array.mask
And, although it works, in my non-simplified problem it is quite a hack to do just for this. Any other options?
I am using numpy 1.10.1 and python 2.7.9.
The mask is itself a numpy array, so when you give a list as the mask, the values in the mask must be copied into a new array. Instead of using a list, pass in a numpy array as the mask.
For example, here are two arrays that we'll use to construct the masked array:
In [38]: data = np.array([1, 2, 3])
In [39]: mask = np.array([True, False, False])
Create our masked array:
In [40]: c = ma.masked_array(data, mask=mask)
In [41]: c
Out[41]:
masked_array(data = [-- 2 3],
mask = [ True False False],
fill_value = 999999)
Change c.mask in-place, and see that mask is also changed:
In [42]: c.mask[0] = False
In [43]: mask
Out[43]: array([False, False, False], dtype=bool)
It is worth noting that the masked_array constructor has the argument copy. If copy is False (the default), the constructor doesn't copy the input arrays, and instead uses the given references (but it can't do that if the inputs are not already numpy arrays). If you use copy=True, then even input arrays will be copied--but that's not what you want.

Categories

Resources