problem with setting numpy's maskedarray fill_value - python

I cannot figure out how to set the fill_value of a real masked array to be np.nan. The array is the result of the calculation of two complex maskedarrays. Somehow, the calculated array's fill_value always gets converted to a complex fill_value, when I want a real fill_value. Even if I explicitly set the fill_value, it won't get set to a float. This is triggering ComplexWarnings in my code because it drops the imaginary part later. I am OK with setting the ang.fill_value manually, but it doesn't work.
import numpy as np
ma1 = np.ma.MaskedArray([1.1+1j, 2.2-1j])
ma2 = np.ma.MaskedArray([2.2+1j, 3.3+1j])
ma1.fill_value = np.nan + np.nan*1j
ma2.fill_value = np.nan + np.nan*1j
ang = np.ma.angle(ma1/ma2, deg=True)
ang.fill_value = np.nan
print(ang.fill_value)
<prints out (nan+0j)>

First, I haven't worked with angle (ma or not), and only played with np.ma on and off, mainly for SO questions.
np.angle is python code; np.ma.angle is produced by a generic wrapper on np.angle.
Without studying those, let's experiement.
Your array ratio:
In [34]: ma1/ma2
Out[34]:
masked_array(data=[(0.5856164383561644+0.18835616438356162j),
(0.526492851135408-0.46257359125315395j)],
mask=[False, False],
fill_value=(nan+nanj))
The non-ma version:
In [35]: (ma1/ma2).data
Out[35]: array([0.58561644+0.18835616j, 0.52649285-0.46257359j])
or
In [36]: np.asarray(ma1/ma2)
Out[36]: array([0.58561644+0.18835616j, 0.52649285-0.46257359j])
The angle:
In [37]: np.ma.angle(ma1/ma2, deg=True)
Out[37]:
masked_array(data=[17.829734225677196, -41.30235354815481],
mask=[False, False],
fill_value=(nan+nanj))
The data dtype looks fine, but the fill dtype is complex. Without ma, it's still masked, but with a different fill, and a simple mask:
In [38]: np.angle(ma1/ma2, deg=True)
Out[38]:
masked_array(data=[ 17.82973423, -41.30235355],
mask=False,
fill_value=1e+20)
If we give it the "raw" data:
In [40]: np.angle((ma1/ma2).data, deg=True)
Out[40]: array([ 17.82973423, -41.30235355])
np.ma is not heavily used, so I'm not surprised that there are bugs in details like this, passing the fill and mask through. Especially in a function like this that can take a complex argument, but returns a real result.
If I don't fiddle with the fill values,
In [41]: ma1 = np.ma.MaskedArray([1.1+1j, 2.2-1j])
...: ma2 = np.ma.MaskedArray([2.2+1j, 3.3+1j])
In [42]: ma1/ma2
Out[42]:
masked_array(data=[(0.5856164383561644+0.18835616438356162j),
(0.526492851135408-0.46257359125315395j)],
mask=[False, False],
fill_value=(1e+20+0j))
In [43]: np.ma.angle(ma1/ma2, deg=True)
Out[43]:
masked_array(data=[17.829734225677196, -41.30235354815481],
mask=[False, False],
fill_value=1e+20)
The angle fill is float.
Casting (nan+nanj) to float might be producing some errors or warnings that it doesn't get with (1e+20+0j). Again we'd have to examine the code.

Related

AttributeError: 'numpy.float64' object has no attribute 'mask'

I'm trying to align dithered images but my code keeps giving me the following error:
AttributeError: 'numpy.float64' object has no attribute 'mask'
Here's the code:
import numpy as np
from astropy.io import fits
import glob
from astropy.stats import sigma_clip
from reproject.mosaicking import find_optimal_celestial_wcs
from reproject import reproject_interp
# In the line below, you may have called "mydata" something else in Assignment 1
calibrated_dir = '../mydata/M52_calibrated/'
# Use glob to create file list that includes images from *all* bands
filelist = glob.glob(calibrated_dir, '*.fit')
# Find the optimal shape of the canvas that will fit all the images, as well as the optimal wcs coordinates
wcs_out, shape_out = find_optimal_celestial_wcs(filelist,auto_rotate=True)
print('Dimensions of new canvas:',shape_out) # Should be bigger than the original 2048x2048 images we started with.
bands = ['PhotB','PhotV','PhotR'] # This is the list of the three filter names
for band in bands: # Loop through the three bands
# Get the list of all the files that were exposed in the current band
filelist = glob.glob(calibrated_dir+'*'+band+'*')
filelist = sorted(filelist)
allexposures = [] # Declare an empty list. Each item of the list will hold the data array of each file in filelist.
airmass = [] # Declare an empty list. Will hold the airmass of each file in filelist.
texp = [] # Declare an empty list. Will hold the exposure times.
for f in filelist:
hdu = fits.open(f) #open the current file
texp.append(hdu[0].header['EXPTIME']) # get the exposure time
airmass.append(hdu[0].header['AIRMASS'])# get the air mass
# This line runs reproject_interp to map the pixels of the image to the pixels of the canvas we created above
# new_image_data below has the same dimensions as the larger canvas.
new_image_data = reproject_interp(f, wcs_out,shape_out=shape_out,return_footprint=False)
allexposures.append(new_image_data)
# Turn the list of arrays into a 3D array
allexposures = np.array(allexposures)
# We have now aligned all the exposures onto the same pixels. Combine them into a single image using sigma_clip and taking the mean.
images_masked = sigma_clip(allexposures, sigma=3.0) # Use sigma_clip to mask pixels more than 3 sigma from the mean of the exposures
combined_image = np.ma.mean(images_masked)
# np.ma.mean() sets pixels to 0 if there were no good pixels to take a mean. The following lines set them to NaN instead.
# NaN means "not a number" - easier to mask later on.
mask = combined_image.mask
combined_image = combined_image.data
combined_image[mask] = np.nan
...
The error comes from the line mask = images_combined.mask. I've tried combined_image = np.ma.mean(images_masked.astype(float64)) and other variations such as that but I cannot get rid of the error. I am new at coding and this is for a class so please be kind I'm really not good at this. Any help is greatly appreciated.
I haven't used astropy, but apparently that use of sigma_clip produces a masked array. To illustrate
In [18]: arr = np.ma.masked_equal([[1,2,3],[2,2,1]],2)
In [19]: arr
Out[19]:
masked_array(
data=[[1, --, 3],
[--, --, 1]],
mask=[[False, True, False],
[ True, True, False]],
fill_value=2)
A simple mean - on all values - produces a number:
In [20]: np.ma.mean(arr)
Out[20]: 1.6666666666666667
But a mean on one of the axes produces another masked array (masked where all elements in the row or column are also masked).
In [21]: np.ma.mean(arr, axis=0)
Out[21]:
masked_array(data=[1.0, --, 2.0],
mask=[False, True, False],
fill_value=1e+20)
In [22]: np.ma.mean(arr, axis=1)
Out[22]:
masked_array(data=[2.0, 1.0],
mask=[False, False],
fill_value=1e+20)
The replacement that you attempt works:
In [36]: x=np.ma.mean(arr, axis=0)
In [37]: x
Out[37]:
masked_array(data=[1.0, --, 2.0],
mask=[False, True, False],
fill_value=1e+20)
In [38]: y = x.data
In [39]: y[x.mask] = np.nan
In [40]: y
Out[40]: array([ 1., nan, 2.])
A more direct way to do this:
In [47]: x=np.ma.mean(arr, axis=0)
In [48]: x.filled(np.nan)
Out[48]: array([ 1., nan, 2.])
allexposures is a 3D array with all the images, with a shape (nbands, nx, ny), so to combine the images you want to do the sigma-clipping along the first axis, and take the mean along the first axis as well:
images_masked = sigma_clip(allexposures, sigma=3.0, axis=0)
combined_image = np.ma.mean(images_masked, axis=0)
images_masked would still be a 3D masked array where the element that have been clipped are masked, and combined_image would be a 2D masked array.

Summing matrix rows excluding indices from other array

Say I have some matrix, W = MxN and a long array of indices z with shape of Mx1.
Now, assume I'd like to sum up the element of each row in W, excluding the index appears for that row in z.
1-d example:
import numpy as np
W = np.array([1.0, 2.0, 8.0])
z = 2
np.sum(np.delete(W,z))
MxN example and desired output:
import numpy as np
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2]).reshape(2,1)
# desired output
# [10. 20.]
I tried to use np.delete and axis=1 with no success
I managed to get around it using tricks like:
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2])
W[np.arange(z.shape[0]), z]=0
print(np.sum(W, axis=1))
# [10. 20.]
but I'm wondering if there's a more elegant way.
Using broadcasting to get the mask to simulate deletion and then sum-reduce -
(W*(z != np.arange(W.shape[-1]))).sum(-1)
Sample runs -
For 2D case :
In [61]: W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
...: z = np.array([0,2]).reshape(2,1)
In [62]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[62]: array([10., 20.])
Works just as well for the 1D case :
In [59]: W = np.array([1.0, 2.0, 8.0])
...: z = 2
In [60]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[60]: 3.0
For 2D case :
With np.einsum for the sum-reduction -
In [53]: np.einsum('ij,ij->i',W,z != np.arange(W.shape[1]))
Out[53]: array([10., 20.])
Summing and then subtracting the z-indexed values for 2D case -
In [134]: W.sum(1) - np.take_along_axis(W,z,axis=1).squeeze(1)
Out[134]: array([10., 20.])
Extend to handle both 2D and 1D cases -
W.sum(-1)-np.take_along_axis(W,np.atleast_1d(z),axis=-1).squeeze(-1)
#Divaka answers are pretty good. I just give another perspective on your question. If you need masking to ignore certain indices and doing multiple operations on array, you should use numpy masked array np.ma.array instead of regular np.array. Masked array is truly for the purpose of ignore certain indices.
document of masked array for more info
z = np.array([0,2]).reshape(2,1)
W_ma = np.ma.array(W, mask=z == np.arange(W.shape[-1]))
In [36]: W_ma
Out[36]:
masked_array(
data=[[--, 2.0, 8.0],
[5.0, 15.0, --]],
mask=[[ True, False, False],
[False, False, True]],
fill_value=1e+20)
From this W_ma masked array, you may do almost all operations the same as np.array. For sum
W_ma.sum(1)
Out[44]:
masked_array(data=[10.0, 20.0],
mask=[False, False],
fill_value=1e+20)
To turn masked array to regular array, you may use compressed, filled, or compress_rowcols
In [46]: W_ma.sum(1).compressed()
Out[46]: array([10., 20.])
Note: I emphasize masked array is useful when you do multiple operations on ignore indices. If you only need to do one or two operations on ignore indices, there is no point to use masked array.

numpy: sqrt in place: is this a bug?

I'm trying to do sqrt in place on a portion of an array, selected using a boolean mask.
Why doesn't this work:
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
np.sqrt(a[[True, False], :], out=a[[True, False], :])
print(a[[True, False], :]) # prints [[4, 9]], sqrt in place failed
print('')
b = np.zeros_like(a[[True, False], :])
np.sqrt(a[[True, False], :], out=b)
print(b) # prints [[2, 3]] sqrt in b succeeded
If I'm selecting a single index instead this works (but it doesn't help me since I want to do a sparse update):
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
np.sqrt(a[0, :], out=a[0, :])
print(a[0, :]) # prints [2, 3]
print('')
b = np.zeros_like(a[0, :])
np.abs(a[0, :], out=b) # prints [2, 3]
print(b)
This is explained in the indexing documentation, relevant part:
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
Indexing with a boolean array is considered "advanced", hence you always get a copy, and modifying it won't touch the original data. Indeed in your first example b is modified but a is not. Using indices only returns a "view", and that is why the original data is modified.
The question identifies that an in-place square root is possible on a simple slice. So given the sparse update, one could loop over the True elements of the (sparse) boolean mask doing in-place square-roots on such slices.
It is not as efficient as it hypothetically could be if the boolean mask indexing returned a view of the original array, but it may be better than nothing.
import numpy as np
a = np.array([[4,9],[16,25]], dtype='float64')
mask = np.array([True, False])
for (i,) in np.argwhere(mask):
slice = a[i]
np.sqrt(slice, out=slice)
print(a)
Gives:
[[ 2. 3.]
[ 16. 25.]]
The sqrt() does not work in place in general. It returns the modified array. So you have to replace the line np.sqrt(a[[True, False], :], out=a[[True, False], :]) with a = np.sqrt(a[[True, False], :], out=a[[True, False], :]) to get the result of the sqrt function in array a.

Numpy.ma: Keep masked array linked to its building blocks (shallow copy?)

If you are building a masked array, its :
class myclass(object):
def __init__(self, data, mask):
self.masked_array = numpy.ma(data, mask=mask)
What I would like is for mask and data to change when I change the masked array. Like:
data = [1,2,3]
mask = [True, False, False]
c = myclass(data, mask)
c.masked_array.mask[0] = False # this will not change mask
The obvious answer is to link the after building the object:
c = myclass(data, mask)
data = c.masked_array.data
mask = c.masker_array.mask
And, although it works, in my non-simplified problem it is quite a hack to do just for this. Any other options?
I am using numpy 1.10.1 and python 2.7.9.
The mask is itself a numpy array, so when you give a list as the mask, the values in the mask must be copied into a new array. Instead of using a list, pass in a numpy array as the mask.
For example, here are two arrays that we'll use to construct the masked array:
In [38]: data = np.array([1, 2, 3])
In [39]: mask = np.array([True, False, False])
Create our masked array:
In [40]: c = ma.masked_array(data, mask=mask)
In [41]: c
Out[41]:
masked_array(data = [-- 2 3],
mask = [ True False False],
fill_value = 999999)
Change c.mask in-place, and see that mask is also changed:
In [42]: c.mask[0] = False
In [43]: mask
Out[43]: array([False, False, False], dtype=bool)
It is worth noting that the masked_array constructor has the argument copy. If copy is False (the default), the constructor doesn't copy the input arrays, and instead uses the given references (but it can't do that if the inputs are not already numpy arrays). If you use copy=True, then even input arrays will be copied--but that's not what you want.

Python numpy masked array initialization

I used masked arrays all the time in my work, but one problem I have is that the initialization of masked arrays is a bit clunky. Specifically, the ma.zeros() and ma.empty() return masked arrays with a mask that doesn't match the array dimension. The reason I want this is so that if I don't assign to a particular element of my array, it is masked by default.
In [4]: A=ma.zeros((3,))
...
masked_array(data = [ 0. 0. 0.],
mask = False,
fill_value = 1e+20)
I can subsequently assign the mask:
In [6]: A.mask=ones((3,))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But why should I have to use two lines to initialize and array? Alternatively, I can ignore the ma.zeros() functionality and specify the mask and data in one line:
In [8]: A=ma.masked_array(zeros((3,)),mask=ones((3,)))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But I think this is also clunky. I have trawled through the numpy.ma documentation but I can't find a neat way of dealing with this. Have I missed something obvious?
Well, the mask in ma.zeros is actually a special constant, ma.nomask, that corresponds to np.bool_(False). It's just a placeholder telling NumPy that the mask hasn't been set.
Using nomask actually speeds up np.ma significantly: no need to keep track of where the masked values are if we know beforehand that there are none.
The best approach is not to set your mask explicitly if you don't need it and leave np.ma set it when needed (ie, when you end up trying to take the log of a negative number).
Side note #1: to set the mask to an array of False with the same shape as your input, use
np.ma.array(..., mask=False)
That's easier to type. Note that it's really the Python False, not np.ma.nomask... Similarly, use mask=True to force all your inputs to be masked (ie, mask will be a bool ndarray full of True, with the same shape as the data).
Side note #2:
If you need to set the mask after initialization, you shouldn't use an assignment to .mask but assign to the special value np.ma.masked, it's safer:
a[:] = np.ma.masked
Unfortunately your Side note#2 recommendation breaks for an array with more than one dimension:
a = ma.zeros( (2,2) )
a[0][0] = ma.masked
a
masked_array(data =
[[ 0. 0.]
[ 0. 0.]],
mask =
False,
fill_value = 1e+20)
Like the OP, I haven't found a neat way around this. Masking a whole row will initialise the mask properly:
a[0] = ma.masked
a
masked_array(data =
[[-- --]
[0.0 0.0]],
mask =
[[ True True]
[False False]],
fill_value = 1e+20)
but if this isn't what you want to do you then have to do a[0] = ma.nomask to undo it. Doing a[0] = ma.nomask immediately after a = ma.zeros( (2,2) ) has no effect.

Categories

Resources