Modify masked tensor or array in function - python

I am trying to modify a masked pytorch tensor inside a function.
I observe the same behaviour for numpy arrays.
from torch import tensor
def foo(x):
"""
Minimal example.
The actual function is complex.
"""
x *= -1
y = tensor([1,2,3])
mask = [False, True, False]
foo(y[mask])
print(y)
# Result: tensor([1, 2, 3]). Expected: tensor([1, -2, 3])
There are two obvious solutions that I can think of. Both have shortcomings I would like to avoid.
def foo1(x):
return -x
y = tensor([1,2,3])
mask = [False, True, False]
y[mask] = foo1(y[mask])
This creates an copy of y[mask], which is not ideal for my RAM-bound application.
def foo2(x, m):
x[m] *= -1
y = tensor([1,2,3])
mask = [False, True, False]
foo2(y, mask)
This works without a copy, but makes the function messy. It has to be aware of the mask and types. E.g. it won't work directly on scalars.
What is the idiomatic way to handle this problem?

Related

How to properly index to an array of changing size due to masking in python

This is a problem I've run into when developing something, and it's a hard question to phrase. So it's best with an simple example:
Imagine you have 4 random number generators which generate an array of size 4:
[rng-0, rng-1, rng-2, rng-3]
| | | |
[val0, val1, val2, val3]
Our goal is to loop through "generations" of arrays populated by these RNGs, and iteratively mask out the RNG which outputted the maximum value.
So an example might be starting out with:
mask = [False, False, False, False], arr = [0, 10, 1, 3], and so we would mask out rng-1.
Then the next iteration could be: mask = [False, True, False, False], arr = [2, 1, 9] (before it gets asked, yes arr HAS to decrease in size with each rng that is masked out). In this case, it is clear that rng-3 should be masked out (e.g. mask[3] = True), but since arr is now of different size than mask, it is tricky to get the right indexing for setting the mask (since the max of arr is at index 2 of the arr, but the corresponding generator is index 3). This problem grows more an more difficult as more generators get masked out (in my case I'm dealing with a mask of size ~30).
If it helps, here is python version of the example:
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
for _ in range(mask.size):
arr = rng.randint(100, size=~mask.sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx = unadjusted_max_value_idx + ????
mask[adjusted_max_value_idx] = True
Any idea a good way to map the index of the max value in the arr to the corresponding index in the mask? (i.e. moving from unadjusted_max_value_idx to adjusted_max_value_idx)
#use a helper list
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
ndxLst=list(range(mask.size))
maskHistory=[]
for _ in range(mask.size):
arr = rng.randint(100, size=(~mask).sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx=ndxLst.pop(unadjusted_max_value_idx)
mask[adjusted_max_value_idx] = True
maskHistory.append(adjusted_max_value_idx)
print(maskHistory)
print(mask)

numpy: sqrt in place: is this a bug?

I'm trying to do sqrt in place on a portion of an array, selected using a boolean mask.
Why doesn't this work:
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
np.sqrt(a[[True, False], :], out=a[[True, False], :])
print(a[[True, False], :]) # prints [[4, 9]], sqrt in place failed
print('')
b = np.zeros_like(a[[True, False], :])
np.sqrt(a[[True, False], :], out=b)
print(b) # prints [[2, 3]] sqrt in b succeeded
If I'm selecting a single index instead this works (but it doesn't help me since I want to do a sparse update):
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
np.sqrt(a[0, :], out=a[0, :])
print(a[0, :]) # prints [2, 3]
print('')
b = np.zeros_like(a[0, :])
np.abs(a[0, :], out=b) # prints [2, 3]
print(b)
This is explained in the indexing documentation, relevant part:
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
Indexing with a boolean array is considered "advanced", hence you always get a copy, and modifying it won't touch the original data. Indeed in your first example b is modified but a is not. Using indices only returns a "view", and that is why the original data is modified.
The question identifies that an in-place square root is possible on a simple slice. So given the sparse update, one could loop over the True elements of the (sparse) boolean mask doing in-place square-roots on such slices.
It is not as efficient as it hypothetically could be if the boolean mask indexing returned a view of the original array, but it may be better than nothing.
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
mask = np.array([True, False])
for (i,) in np.argwhere(mask):
slice = a[i]
np.sqrt(slice, out=slice)
print(a)
Gives:
[[ 2. 3.]
[ 16. 25.]]
The sqrt() does not work in place in general. It returns the modified array. So you have to replace the line np.sqrt(a[[True, False], :], out=a[[True, False], :]) with a = np.sqrt(a[[True, False], :], out=a[[True, False], :]) to get the result of the sqrt function in array a.

Reduce boolean values in python ndarray using AND

I have a python array of this shape [3, 1000, 3] with boolean values inside. The first 3 is the batch size and the values of a batch are like these
[[False, False, False]\n
[False, True, True]\n
[False, False, True]\n
[True, True, True]\n
...
]
size (1000, 3)
I want to apply the and function to each triplet to end up with this new array
[[False]\n
[False]\n
[False]\n
[True]\n
...
]
size (3, 1000)
Looking at numpy I didn't find something useful. I've also tried to import operator and apply reduce(operator.and_, array) but it doesn't work.
Any idea to solve this?
You can easily do this using np.all.
This will check if all values along the last dimension are True:
y = np.all(arr, axis=-1)
y.shape # (3, 1000)

Python list notation, Numpy array notation: predictions[predictions < 1e-10] = 1e-10

I am trying to find out operation applied on list. I have list/array name predictions and and executing following set of instruction.
predictions[predictions < 1e-10] = 1e-10
This code snippet is from a Udacity Machine Learning assignment that uses Numpy.
It was used in the following manner:
def logprob(predictions, labels):
"""Log-probability of the true labels in a predicted batch."""
predictions[predictions < 1e-10] = 1e-10
return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0]
As pointed out by #MosesKoledoye and various others, it is actually a Numpy array. (Numpy is a Python library)
What does this line do?
As pointed out by #MosesKoledoye, predictions is most likely a numpy array.
A boolean array would then be generated using predictions < 1e-10. At all indices where the boolean array set by the condition is True, the value will be changed to 1e-10, ie. 10-10.
Example:
>>> a = np.array([1,2,3,4,5]) #define array
>>> a < 3 #define boolean array through condition
array([ True, True, False, False, False], dtype=bool)
>>> a[a<3] #select elements using boolean array
array([1, 2])
>>> a[a<3] = -1 #change value of elements which fit condition
>>> a
array([-1, -1, 3, 4, 5])
The reason this might be done in the code could be to prevent division by zero or to prevent negative numbers messing up things by instead inserting a very small number.
All elements of the array, for which the condition (element < 1e-10) is true, are set to 1e-10.
Practically you are setting a minimum value.

Numpy.ma: Keep masked array linked to its building blocks (shallow copy?)

If you are building a masked array, its :
class myclass(object):
def __init__(self, data, mask):
self.masked_array = numpy.ma(data, mask=mask)
What I would like is for mask and data to change when I change the masked array. Like:
data = [1,2,3]
mask = [True, False, False]
c = myclass(data, mask)
c.masked_array.mask[0] = False # this will not change mask
The obvious answer is to link the after building the object:
c = myclass(data, mask)
data = c.masked_array.data
mask = c.masker_array.mask
And, although it works, in my non-simplified problem it is quite a hack to do just for this. Any other options?
I am using numpy 1.10.1 and python 2.7.9.
The mask is itself a numpy array, so when you give a list as the mask, the values in the mask must be copied into a new array. Instead of using a list, pass in a numpy array as the mask.
For example, here are two arrays that we'll use to construct the masked array:
In [38]: data = np.array([1, 2, 3])
In [39]: mask = np.array([True, False, False])
Create our masked array:
In [40]: c = ma.masked_array(data, mask=mask)
In [41]: c
Out[41]:
masked_array(data = [-- 2 3],
mask = [ True False False],
fill_value = 999999)
Change c.mask in-place, and see that mask is also changed:
In [42]: c.mask[0] = False
In [43]: mask
Out[43]: array([False, False, False], dtype=bool)
It is worth noting that the masked_array constructor has the argument copy. If copy is False (the default), the constructor doesn't copy the input arrays, and instead uses the given references (but it can't do that if the inputs are not already numpy arrays). If you use copy=True, then even input arrays will be copied--but that's not what you want.

Categories

Resources