Python - replace masked data in arrays - python

I would like to replace by zero value all my masked values in 2D array.
I saw with np.copyto it was apparently possible to do that as :
test=np.copyto(array, 0, where = mask)
But i have an error message...'module' object has no attribute 'copyto'. Is there an equivalent way to do that?

Try numpy.ma.filled()
I think this is exactly what you need
In [29]: a
Out[29]: array([ 1, 0, 25, 0, 1, 4, 0, 2, 3, 0])
In [30]: am = n.ma.MaskedArray(n.ma.log(a),fill_value=0)
In [31]: am
Out[31]:
masked_array(data = [0.0 -- 3.2188758248682006 -- 0.0 1.3862943611198906 -- 0.6931471805599453 1.0986122886681098 --],
mask = [False True False True False False True False False True],
fill_value = 0.0)
In [32]: am.filled()
Out[32]:
array([ 0. , 0. , 3.21887582, 0. , 0. ,
1.38629436, 0. , 0.69314718, 1.09861229, 0. ])

test = np.copyto(array, 0, where=mask) is equivalent to:
array = np.where(mask, 0, array)
test = None
(I'm not sure why you would want to assign a value to the return value of np.copyto; it always returns None if no Exception is raised.)
Why not use array[mask] = 0?
Indeed, that would work (and has nicer syntax) if mask is a boolean array with the same shape as array. If mask doesn't have the same shape then array[mask] = 0 and np.copyto(array, 0, where=mask) may behave differently:
np.copyto (is documented to) and np.where (appears to) broadcast the shape of the mask to match array.
In contrast, array[mask] = 0 does not broadcast mask. This leads to a big difference in behavior when the mask does not have the same shape as array:
In [60]: array = np.arange(12).reshape(3,4)
In [61]: mask = np.array([True, False, False, False], dtype=bool)
In [62]: np.where(mask, 0, array)
Out[62]:
array([[ 0, 1, 2, 3],
[ 0, 5, 6, 7],
[ 0, 9, 10, 11]])
In [63]: array[mask] = 0
In [64]: array
Out[64]:
array([[ 0, 0, 0, 0],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
When array is 2-dimensional and mask is a 1-dimensional boolean array,
array[mask] is selecting rows of array (where mask is True) and
array[mask] = 0 sets those rows to zero.
Surprisingly, array[mask] does not raise an IndexError even though the mask has 4 elements and array only has 3 rows. No IndexError is raised when the fourth value is False, but an IndexError is raised if the fourth value is True:
In [91]: array[np.array([True, False, False, False])]
Out[91]: array([[0, 1, 2, 3]])
In [92]: array[np.array([True, False, False, True])]
IndexError: index 3 is out of bounds for axis 0 with size 3

Related

What is the fastest way to get non-diagonal mirror elements in a matrix that are zero?

Let say there is a matrix as follow:
a = np.array([[74, 0, 2],
[ 0, 73, 8],
[ 0, 10, 72]])
I want to find mirror elements that are zero in both upper and lower triangles and set them to nan. E.g. In this case a[0, 1] and a[1, 0]. I can write a loop like:
m = np.zeros((3, 3))
for i in range(a.shape[0]):
for j in range(a.shape[1]):
if i == j:
m[i, j] = a[i, j]
continue
if (a[i, j] == 0) & (a[j, i] == 0):
m[i, j] = np.nan
m[j, i] = np.nan
continue
m[i, j] = a[i, j]
m[j, i] = a[j, i]
print(m)
[[74. nan 2.]
[nan 73. 8.]
[ 0. 10. 72.]]
This does the job. But I have millions of these matrices and I am wondering what would be a better and faster approach.
Here's another alternative, based on my comment suggestion. Note that the "ndiag" thing is not required if there will never be zeros along the diagonal.
import numpy as np
ndiag = 1-np.eye(3)
print(ndiag)
a = np.array( [[74,0,2],[0,73,8],[0,10,72]] ).astype(float)
m = a == 0
print( m )
m = np.logical_and( ndiag, np.logical_and( m, m.T ) )
print( m )
a[m] = np.nan
print( a )
Output:
[[0 1 1]
[1 0 1]
[1 1 0]]
[[False True False]
[ True False False]
[ True False False]]
[[False True False]
[ True False False]
[False False False]]
[[74. nan 2.]
[nan 73. 8.]
[ 0. 10. 72.]]
I've always had a preference for triu_indices and tril_indices for this sort of task. The nice thing is that they're just indices, so if all your matrices are the same size, you can cache them once without copying any specific data. The other nice thing is that for a given size n, you have that triu_indices(n, 1) is the swapped result of tril_indices(n, -1) up to some sorting that you don't generally care about.
So if all your matrices are of shape (n, n),
rows, cols = np.triu_indices(n, 1)
mask = (a[rows, cols] == a[cols, rows]) & (a[rows, cols] != 0)
a[rows[mask], cols[mask]] = a[cols[mask], rows[mask]] = np.nan
Keep in mind that you can't assign np.nan to an array unless it's a floating point type. Also, you may get a tiny bit of mileage out of pre-computing rows[mask] and cols[mask]:
rm = rows[mask]
cm = cols[mask]
a[rm, cm] = a[cm, rm] = np.nan
Here is a completely vectorised approach to solve this -
np.where(np.logical_and(np.tril(a) == np.triu(a).T, a==0), np.nan, a)
array([[74., nan, 2.],
[nan, 73., 8.],
[ 0., 10., 72.]])
Explanation -
Lets see what happens in the first step -
np.tril(a) #keeps only the lower triangular, and others become 0
array([[74, 0, 0],
[ 0, 73, 0],
[ 0, 10, 72]])
np.triu(a).T #keeps only the upper triangular and others become 0. Then flips it to become lower triangular
array([[74, 0, 0],
[ 0, 73, 0],
[ 2, 8, 72]])
Equating these will give you the upper triangular part always as True, while lower triangual matrix contains True only for mirror matching elements.
np.tril(a) == np.triu(a).T
array([[ True, True, True],
[ True, True, True],
[False, False, True]])
Now, when you take a logical_and of this boolean with the a==0 matrix, only the values where the original matrix had 0 and were mirror elements remain.
np.logical_and(np.tril(a) == np.triu(a).T, a==0)
array([[False, True, False],
[ True, False, False],
[False, False, False]])
Now you can use np.where to replace True values with nan and keep the remaining values intact.
np.where(np.logical_and(np.tril(a) == np.triu(a).T, a==0), np.nan, a)
array([[74., nan, 2.],
[nan, 73., 8.],
[ 0., 10., 72.]])

Conditional average with numpy

Given a 2x3 array, I want to calculate the average on axis=0, but only considering values that are larger than 0.
So given the array
[ [1,0],
[0,0],
[1,0] ]
I want the output to be
# 1, 0, 1 filtered for > 0 gives 1, 1, average = (1+1)/2 = 1
# 0, 0, 0 filtered for > 0 gives 0, 0, 0, average = 0
[1 0]
My current code is
import numpy as np
frame = np.array([ [1,0],
[0,0],
[1,0] ])
weights=np.array(frame)>0
print("weights:")
print(weights)
print("average without weights:")
print((np.average(frame, axis=0)))
print("average with weights:")
print((np.average(frame, axis=0, weights=weights)))
This gives me
weights:
[[ True False]
[False False]
[ True False]]
average without weights:
[ 0.66666667 0. ]
average with weights:
Traceback (most recent call last):
File "C:\Users\myuser\project\test.py", line 123, in <module>
print((np.average(frame, axis=0, weights=weights)))
File "C:\Users\myuser\Miniconda3\envs\myenv\lib\site-packages\numpy\lib\function_base.py", line 1140, in average
"Weights sum to zero, can't be normalized")
ZeroDivisionError: Weights sum to zero, can't be normalized
I don't understand this error. What am I doing wrong and how can I get the average for all values greater than zero along axis=0? Thanks!
You can get the mask of greater than zeros and use it to do elementwise multilication and sum-reduction along the first axis. Finally, divide by the number of masked elements along the first axis for getting the average values.
Thus, one solution would be -
mask = a > 0 # Input array : a
out = np.einsum('i...,i...->...',a,mask)/mask.sum(0)
Sample run -
In [52]: a
Out[52]:
array([[ 3, -3, 3],
[ 2, 2, 0],
[ 0, -3, 1],
[ 0, 1, 1]])
In [53]: mask = a > 0
In [56]: np.einsum('i...,i...->...',a,mask) # summations of > 0s
Out[56]: array([5, 3, 5])
In [57]: np.einsum('i...,i...->...',a,mask)/mask.sum(0) # avg values of >0s
Out[57]: array([ 2.5 , 1.5 , 1.66666667])
To account for all zero columns, it seems we are expecting 0 as the result. So, we can use np.where to do the choosing, like so -
In [61]: a[:,-1] = 0
In [62]: a
Out[62]:
array([[ 3, -3, 0],
[ 2, 2, 0],
[ 0, -3, 0],
[ 0, 1, 0]])
In [63]: mask = a > 0
In [65]: np.where( mask.any(0), np.einsum('i...,i...->...',a,mask)/mask.sum(0), 0)
__main__:1: RuntimeWarning: invalid value encountered in true_divide
Out[65]: array([ 2.5, 1.5, 0. ])
Just ignore the warning there.
If you feel paranoid about warnings, use masking -
mask = a > 0
vm = mask.any(0) # valid mask
out = np.zeros(a.shape[1])
out[vm] = np.einsum('ij,ij->j',a[:,vm],mask[:,vm])/mask.sum(0)[vm]

Extract elements from numpy array, that are not in list of indexes

I want to do something similar to what was asked here NumPy array, change the values that are NOT in a list of indices, but not quite the same.
Consider a numpy array:
> a = np.array([0.2, 5.6, 88, 12, 1.3, 6, 8.9])
I know I can access its elements via a list of indexes, like:
> indxs = [1, 2, 5]
> a[indxs]
array([ 5.6, 88. , 6. ])
But I also need to access those elements which are not in the indxs list. Naively, this is:
> a[not in indxs]
> array([0.2, 12, 1.3, 8.9])
What is the proper way to do this?
In [170]: a = np.array([0.2, 5.6, 88, 12, 1.3, 6, 8.9])
In [171]: idx=[1,2,5]
In [172]: a[idx]
Out[172]: array([ 5.6, 88. , 6. ])
In [173]: np.delete(a,idx)
Out[173]: array([ 0.2, 12. , 1.3, 8.9])
delete is more general than you really need, using different strategies depending on the inputs. I think in this case it uses the boolean mask approach (timings should be similar).
In [175]: mask=np.ones_like(a, bool)
In [176]: mask
Out[176]: array([ True, True, True, True, True, True, True], dtype=bool)
In [177]: mask[idx]=False
In [178]: mask
Out[178]: array([ True, False, False, True, True, False, True], dtype=bool)
In [179]: a[mask]
Out[179]: array([ 0.2, 12. , 1.3, 8.9])
One way is to use a boolean mask and just invert the indices to be false:
mask = np.ones(a.size, dtype=bool)
mask[indxs] = False
a[mask]
One approach with np.in1d to create the mask of the ones from indxs present and then inverting it and indexing the input array with it for the desired output -
a[~np.in1d(np.arange(a.size),indxs)]

How to mix two numpy arrays using a boolean mask to create one of the same size efficiently?

I use two arrays of the same size like :
>>> a = [range(10)]
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = -a
>>> b
array([ 0, -1, -2, -3, -4, -5, -6, -7, -8, -9])
I want to create another array using a boolean "mask", for example :
>>> m = (a % 2 == 0)
>>> m
array([ True, False, True, False, True, False, True, False, True, False], dtype=bool)
Then I create a third array of the same size and change its values for the ones of a if m is True and for the ones of b if m is False :
>>> c = ones(10)
>>> c[m] = a[m]
>>> c[~m] = b[~m]
>>> c
array([ 0., -1., 2., -3., 4., -5., 6., -7., 8., -9.])
I wonder if there is a way to do the three last operations (the creation of c) within just one operation (especially for performance optimisation).
The problem of doing :
c = a * m + b * m
is when there are NaN in a or b, when it is multiplied by zero it still makes NaN.
PS : The example I gave would also work for n-dimensionnal arrays.
You are looking for numpy.where:
c = numpy.where(m, a, b)
Good luck.
Using list comprehension you could create some conditionals that determine what list to pick form?
result = [ a[i] if bol else b[i] for i, bol in enumerate(mask)]
And then you could apply some functions to a[i] or b[i] depending on how you want to alter them.

Python: intersection indices numpy array

How can I get the indices of intersection points between two numpy arrays? I can get intersecting values with intersect1d:
import numpy as np
a = np.array(xrange(11))
b = np.array([2, 7, 10])
inter = np.intersect1d(a, b)
# inter == array([ 2, 7, 10])
But how can I get the indices into a of the values in inter?
You could use the boolean array produced by in1d to index an arange. Reversing a so that the indices are different from the values:
>>> a[::-1]
array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>> a = a[::-1]
intersect1d still returns the same values...
>>> numpy.intersect1d(a, b)
array([ 2, 7, 10])
But in1d returns a boolean array:
>>> numpy.in1d(a, b)
array([ True, False, False, True, False, False, False, False, True,
False, False], dtype=bool)
Which can be used to index a range:
>>> numpy.arange(a.shape[0])[numpy.in1d(a, b)]
array([0, 3, 8])
>>> indices = numpy.arange(a.shape[0])[numpy.in1d(a, b)]
>>> a[indices]
array([10, 7, 2])
To simplify the above, though, you could use nonzero -- this is probably the most correct approach, because it returns a tuple of uniform lists of X, Y... coordinates:
>>> numpy.nonzero(numpy.in1d(a, b))
(array([0, 3, 8]),)
Or, equivalently:
>>> numpy.in1d(a, b).nonzero()
(array([0, 3, 8]),)
The result can be used as an index to arrays of the same shape as a with no problems.
>>> a[numpy.nonzero(numpy.in1d(a, b))]
array([10, 7, 2])
But note that under many circumstances, it makes sense just to use the boolean array itself, rather than converting it into a set of non-boolean indices.
Finally, you can also pass the boolean array to argwhere, which produces a slightly differently-shaped result that's not as suitable for indexing, but might be useful for other purposes.
>>> numpy.argwhere(numpy.in1d(a, b))
array([[0],
[3],
[8]])
If you need to get unique values as given by intersect1d:
import numpy as np
a = np.array([range(11,21), range(11,21)]).reshape(20)
b = np.array([12, 17, 20])
print(np.intersect1d(a,b))
#unique values
inter = np.in1d(a, b)
print(a[inter])
#you can see these values are not unique
indices=np.array(range(len(a)))[inter]
#These are the non-unique indices
_,unique=np.unique(a[inter], return_index=True)
uniqueIndices=indices[unique]
#this grabs the unique indices
print(uniqueIndices)
print(a[uniqueIndices])
#now they are unique as you would get from np.intersect1d()
Output:
[12 17 20]
[12 17 20 12 17 20]
[1 6 9]
[12 17 20]
indices = np.argwhere(np.in1d(a,b))
For Python >= 3.5, there's another solution to do so
Other Solution
Let we go through this step by step.
Based on the original code from the question
import numpy as np
a = np.array(range(11))
b = np.array([2, 7, 10])
inter = np.intersect1d(a, b)
First, we create a numpy array with zeros
c = np.zeros(len(a))
print (c)
output
>>> [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Second, change array value of c using intersect index. Hence, we have
c[inter] = 1
print (c)
output
>>>[ 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]
The last step, use the characteristic of np.nonzero(), it will return exactly the index of the non-zero term you want.
inter_with_idx = np.nonzero(c)
print (inter_with_idx)
Final output
array([ 2, 7, 10])
Reference
[1] numpy.nonzero
As of numpy version 1.15.0 intersect1d has a return_indices option :
numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

Categories

Resources