I'm trying to align dithered images but my code keeps giving me the following error:
AttributeError: 'numpy.float64' object has no attribute 'mask'
Here's the code:
import numpy as np
from astropy.io import fits
import glob
from astropy.stats import sigma_clip
from reproject.mosaicking import find_optimal_celestial_wcs
from reproject import reproject_interp
# In the line below, you may have called "mydata" something else in Assignment 1
calibrated_dir = '../mydata/M52_calibrated/'
# Use glob to create file list that includes images from *all* bands
filelist = glob.glob(calibrated_dir, '*.fit')
# Find the optimal shape of the canvas that will fit all the images, as well as the optimal wcs coordinates
wcs_out, shape_out = find_optimal_celestial_wcs(filelist,auto_rotate=True)
print('Dimensions of new canvas:',shape_out) # Should be bigger than the original 2048x2048 images we started with.
bands = ['PhotB','PhotV','PhotR'] # This is the list of the three filter names
for band in bands: # Loop through the three bands
# Get the list of all the files that were exposed in the current band
filelist = glob.glob(calibrated_dir+'*'+band+'*')
filelist = sorted(filelist)
allexposures = [] # Declare an empty list. Each item of the list will hold the data array of each file in filelist.
airmass = [] # Declare an empty list. Will hold the airmass of each file in filelist.
texp = [] # Declare an empty list. Will hold the exposure times.
for f in filelist:
hdu = fits.open(f) #open the current file
texp.append(hdu[0].header['EXPTIME']) # get the exposure time
airmass.append(hdu[0].header['AIRMASS'])# get the air mass
# This line runs reproject_interp to map the pixels of the image to the pixels of the canvas we created above
# new_image_data below has the same dimensions as the larger canvas.
new_image_data = reproject_interp(f, wcs_out,shape_out=shape_out,return_footprint=False)
allexposures.append(new_image_data)
# Turn the list of arrays into a 3D array
allexposures = np.array(allexposures)
# We have now aligned all the exposures onto the same pixels. Combine them into a single image using sigma_clip and taking the mean.
images_masked = sigma_clip(allexposures, sigma=3.0) # Use sigma_clip to mask pixels more than 3 sigma from the mean of the exposures
combined_image = np.ma.mean(images_masked)
# np.ma.mean() sets pixels to 0 if there were no good pixels to take a mean. The following lines set them to NaN instead.
# NaN means "not a number" - easier to mask later on.
mask = combined_image.mask
combined_image = combined_image.data
combined_image[mask] = np.nan
...
The error comes from the line mask = images_combined.mask. I've tried combined_image = np.ma.mean(images_masked.astype(float64)) and other variations such as that but I cannot get rid of the error. I am new at coding and this is for a class so please be kind I'm really not good at this. Any help is greatly appreciated.
I haven't used astropy, but apparently that use of sigma_clip produces a masked array. To illustrate
In [18]: arr = np.ma.masked_equal([[1,2,3],[2,2,1]],2)
In [19]: arr
Out[19]:
masked_array(
data=[[1, --, 3],
[--, --, 1]],
mask=[[False, True, False],
[ True, True, False]],
fill_value=2)
A simple mean - on all values - produces a number:
In [20]: np.ma.mean(arr)
Out[20]: 1.6666666666666667
But a mean on one of the axes produces another masked array (masked where all elements in the row or column are also masked).
In [21]: np.ma.mean(arr, axis=0)
Out[21]:
masked_array(data=[1.0, --, 2.0],
mask=[False, True, False],
fill_value=1e+20)
In [22]: np.ma.mean(arr, axis=1)
Out[22]:
masked_array(data=[2.0, 1.0],
mask=[False, False],
fill_value=1e+20)
The replacement that you attempt works:
In [36]: x=np.ma.mean(arr, axis=0)
In [37]: x
Out[37]:
masked_array(data=[1.0, --, 2.0],
mask=[False, True, False],
fill_value=1e+20)
In [38]: y = x.data
In [39]: y[x.mask] = np.nan
In [40]: y
Out[40]: array([ 1., nan, 2.])
A more direct way to do this:
In [47]: x=np.ma.mean(arr, axis=0)
In [48]: x.filled(np.nan)
Out[48]: array([ 1., nan, 2.])
allexposures is a 3D array with all the images, with a shape (nbands, nx, ny), so to combine the images you want to do the sigma-clipping along the first axis, and take the mean along the first axis as well:
images_masked = sigma_clip(allexposures, sigma=3.0, axis=0)
combined_image = np.ma.mean(images_masked, axis=0)
images_masked would still be a 3D masked array where the element that have been clipped are masked, and combined_image would be a 2D masked array.
Related
Assume grid_sheet is an array (1000, 1000, 3)
and
array2 is numpy array shaped (13k-ish, 3).
We're basically treating this array2 like a list. A list of rgb value combinations. Each combination is unique.
And grid_sheet should be treated like a screenshot as if you've used snipping tool to create the image.
blank_sheet = np.zeros((grid_sheet.shape[0], grid_sheet.shape[1]))
for data in array2:
blank_sheet = np.where(((grid_sheet[:,:,2] == data[2]) & (grid_sheet[:,:,1] == data[1]) & (grid_sheet[:,:,0] == data[0])), blank_sheet+1, blank_sheet)
The output would be like a boolean array the same size as the grid_sheet.
I don't want to use a for loop on array2 because it's just too slow.
I've tried splitting the channels to compare to their corresponding columns but when dstacking and summing it all back together just shows it marks nearly then entire grid with 1s. Results are the same if i merge the values together flatten then compare and then reshape to an image representable way. There are a number of other idea's i've tried, plenty of stackoverflow solutions i've tried to merge with others. I hardly see any point in nditer. someone tried suggesting itertools but i don't think the 2 mesh well together.
We can use broadcasting for this purpose. But first we have to add additional axes and slightly reorganize data.
To apply comparison along the third axis we have to transpose the array with colors, so that its colors are numbered by second index. And put two additional dimensions at the beginning to fit with the image plane:
colors = array2.T[None,None,:,:]
Now colors has 4 dimensions and its shape is (1, 1, 3, len(array2)). Next step is to add forth dimension to the image which will correspond the index of each color of the array2:
image = grid_sheet[:,:,:,None]
Now image has also 4 dimensions and its shape is (1000, 1000, 3, 1). If we compare image and colors, the comparison will be done along the third axis, i.e. by colors only. To find out if all parts of a color match the color in the image point we apply all(2), where 2 addresses the third axis. Then we apply any along the last dimension in order to find if any of the given colors matches the color of the image point:
result = (image == colors).all(2).any(2)
Note, that after the all method the number of dimentions has been reduced by 1, so the index of the last dimension will be 2. That's why we put 2 as the parameter of any.
Test case
from numpy import arange, array, newaxis
image = arange(3*3*2).reshape(3,3,2)
colors = array([[0,1], [2,3], [4,5], [8,9]])
expected = array([
[ True, True, True],
[False, True, False],
[False, False, False]
])
image = image[:, :, :, newaxis]
colors = colors.T[newaxis, newaxis, :, :]
assert colors.shape == (1,1,2,4)
assert image.shape == (3,3,2,1)
result = (image == colors).all(2).any(2)
assert (result == expected).all()
An example with Dask to process big pictures
import numpy as np
import dask.array as da
from dask.distributed import Client
client = Client(n_workers=4)
display(client)
# X, Y : a size of an image
# N : a number of colors to check
X, Y, N = 1080, 1920, 13921
# dX, dY, dN : dimensions of chunks for dask.array
# values vary by computer
dX, dY, dN = 360, 640, 500
image = np.arange(X*Y*3).reshape(X, Y, 3)
colors = np.arange(N*3).reshape(N, 3)
image = image[:, :, :, None]
colors = colors.T[None, None, :, :]
# a shape of chunks should resemble the logic of broadcasting
im = da.from_array(image, chunks=(dX, dY, 3, 1))
co = da.from_array(colors, chunks=(1, 1, 3, dN))
im, co = da.broadcast_arrays(im, co)
re = (im == co).all(2).any(2)
result = re.compute()
assert result.shape == (X, Y)
assert result.sum() == N
This is a problem I've run into when developing something, and it's a hard question to phrase. So it's best with an simple example:
Imagine you have 4 random number generators which generate an array of size 4:
[rng-0, rng-1, rng-2, rng-3]
| | | |
[val0, val1, val2, val3]
Our goal is to loop through "generations" of arrays populated by these RNGs, and iteratively mask out the RNG which outputted the maximum value.
So an example might be starting out with:
mask = [False, False, False, False], arr = [0, 10, 1, 3], and so we would mask out rng-1.
Then the next iteration could be: mask = [False, True, False, False], arr = [2, 1, 9] (before it gets asked, yes arr HAS to decrease in size with each rng that is masked out). In this case, it is clear that rng-3 should be masked out (e.g. mask[3] = True), but since arr is now of different size than mask, it is tricky to get the right indexing for setting the mask (since the max of arr is at index 2 of the arr, but the corresponding generator is index 3). This problem grows more an more difficult as more generators get masked out (in my case I'm dealing with a mask of size ~30).
If it helps, here is python version of the example:
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
for _ in range(mask.size):
arr = rng.randint(100, size=~mask.sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx = unadjusted_max_value_idx + ????
mask[adjusted_max_value_idx] = True
Any idea a good way to map the index of the max value in the arr to the corresponding index in the mask? (i.e. moving from unadjusted_max_value_idx to adjusted_max_value_idx)
#use a helper list
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
ndxLst=list(range(mask.size))
maskHistory=[]
for _ in range(mask.size):
arr = rng.randint(100, size=(~mask).sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx=ndxLst.pop(unadjusted_max_value_idx)
mask[adjusted_max_value_idx] = True
maskHistory.append(adjusted_max_value_idx)
print(maskHistory)
print(mask)
Say I have some matrix, W = MxN and a long array of indices z with shape of Mx1.
Now, assume I'd like to sum up the element of each row in W, excluding the index appears for that row in z.
1-d example:
import numpy as np
W = np.array([1.0, 2.0, 8.0])
z = 2
np.sum(np.delete(W,z))
MxN example and desired output:
import numpy as np
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2]).reshape(2,1)
# desired output
# [10. 20.]
I tried to use np.delete and axis=1 with no success
I managed to get around it using tricks like:
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2])
W[np.arange(z.shape[0]), z]=0
print(np.sum(W, axis=1))
# [10. 20.]
but I'm wondering if there's a more elegant way.
Using broadcasting to get the mask to simulate deletion and then sum-reduce -
(W*(z != np.arange(W.shape[-1]))).sum(-1)
Sample runs -
For 2D case :
In [61]: W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
...: z = np.array([0,2]).reshape(2,1)
In [62]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[62]: array([10., 20.])
Works just as well for the 1D case :
In [59]: W = np.array([1.0, 2.0, 8.0])
...: z = 2
In [60]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[60]: 3.0
For 2D case :
With np.einsum for the sum-reduction -
In [53]: np.einsum('ij,ij->i',W,z != np.arange(W.shape[1]))
Out[53]: array([10., 20.])
Summing and then subtracting the z-indexed values for 2D case -
In [134]: W.sum(1) - np.take_along_axis(W,z,axis=1).squeeze(1)
Out[134]: array([10., 20.])
Extend to handle both 2D and 1D cases -
W.sum(-1)-np.take_along_axis(W,np.atleast_1d(z),axis=-1).squeeze(-1)
#Divaka answers are pretty good. I just give another perspective on your question. If you need masking to ignore certain indices and doing multiple operations on array, you should use numpy masked array np.ma.array instead of regular np.array. Masked array is truly for the purpose of ignore certain indices.
document of masked array for more info
z = np.array([0,2]).reshape(2,1)
W_ma = np.ma.array(W, mask=z == np.arange(W.shape[-1]))
In [36]: W_ma
Out[36]:
masked_array(
data=[[--, 2.0, 8.0],
[5.0, 15.0, --]],
mask=[[ True, False, False],
[False, False, True]],
fill_value=1e+20)
From this W_ma masked array, you may do almost all operations the same as np.array. For sum
W_ma.sum(1)
Out[44]:
masked_array(data=[10.0, 20.0],
mask=[False, False],
fill_value=1e+20)
To turn masked array to regular array, you may use compressed, filled, or compress_rowcols
In [46]: W_ma.sum(1).compressed()
Out[46]: array([10., 20.])
Note: I emphasize masked array is useful when you do multiple operations on ignore indices. If you only need to do one or two operations on ignore indices, there is no point to use masked array.
Say I have a numpy array that has some float('nan'), I don't want to impute those data now and I want to first normalize those and keep the NaN data at the original space, is there any way I can do that?
Previously I used normalize function in sklearn.Preprocessing, but that function seems can't take any NaN contained array as input.
You can mask your array using the numpy.ma.array function and subsequently apply any numpy operation:
import numpy as np
a = np.random.rand(10) # Generate random data.
a = np.where(a > 0.8, np.nan, a) # Set all data larger than 0.8 to NaN
a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs
a_norm = a / np.sum(a) # The sum function ignores the masked values.
a_norm2 = a / np.std(a) # The std function ignores the masked values.
You can still access your raw data:
print a.data
You can use numpy.nansum to compute the norm and ignore nan:
In [54]: x
Out[54]: array([ 1., 2., nan, 3.])
Here's the norm with nan ignored:
In [55]: np.sqrt(np.nansum(np.square(x)))
Out[55]: 3.7416573867739413
y is the normalized array:
In [56]: y = x / np.sqrt(np.nansum(np.square(x)))
In [57]: y
Out[57]: array([ 0.26726124, 0.53452248, nan, 0.80178373])
In [58]: np.linalg.norm(y[~np.isnan(y)])
Out[58]: 1.0
The nansum and np.ma.array answers are good options, however, those functions are not as commonly used or explicit (IMHO) as the following:
import numpy as np
def rms(arr):
arr = np.array(arr) # Sanitize the input
np.sqrt(np.mean(np.square(arr[np.isfinite(arr)]))) #root-mean-square
print(rms([np.nan,-1,0,1]))
I have been searching for a python alternative to MATLAB's inpolygon() and I have come across contains_points as a good option.
However, the docs are a little bare with no indication of what type of data contains_points expects:
contains_points(points, transform=None, radius=0.0)
Returns a bool array which is True if the path contains the corresponding point.
If transform is not None, the path will be transformed before performing the test.
radius allows the path to be made slightly larger or smaller.
I have the polygon stored as an n*2 numpy array (where n is quite large ~ 500). As far as I can see I need to call the Path() method on this data which seems to work OK:
poly_path = Path(poly_points)
At the moment I also have the points I wish to test stored as another n*2 numpy array (catalog_points).
Perhaps my problem lies here? As when I run:
in_poly = poly_path.contains_points(catalog_points)
I get back an ndarray containing False for every value no matter the set of points I use (I have tested this on arrays of points well within the polygon).
Often in these situations, I find the source to be illuminating...
We can see the source for path.contains_point accepts a container that has at least 2 elements. The source for contains_points is a bit harder to figure out since it calls through to a C function Py_points_in_path. It seems that this function accepts a iterable that yields elements that have a length 2:
>>> from matplotlib import path
>>> p = path.Path([(0,0), (0, 1), (1, 1), (1, 0)]) # square with legs length 1 and bottom left corner at the origin
>>> p.contains_points([(.5, .5)])
array([ True], dtype=bool)
Of course, we could use a numpy array of points as well:
>>> points = np.array([.5, .5]).reshape(1, 2)
>>> points
array([[ 0.5, 0.5]])
>>> p.contains_points(points)
array([ True], dtype=bool)
And just to check that we aren't always just getting True:
>>> points = np.array([.5, .5, 1, 1.5]).reshape(2, 2)
>>> points
array([[ 0.5, 0.5],
[ 1. , 1.5]])
>>> p.contains_points(points)
array([ True, False], dtype=bool)
Make sure that the vertices are ordered as wanted. Below vertices are ordered in a way that the resulting path is a pair of triangles rather than a rectangle. So, contains_points only returns True for points inside any of the triangles.
>>> p = path.Path(np.array([bfp1, bfp2, bfp4, bfp3]))
>>> p
Path([[ 5.53147871 0.78330843]
[ 1.78330843 5.46852129]
[ 0.53147871 -3.21669157]
[-3.21669157 1.46852129]], None)
>>> IsPointInside = np.array([[1, 2], [1, 9]])
>>> IsPointInside
array([[1, 2],
[1, 9]])
>>> p.contains_points(IsPointInside)
array([False, False], dtype=bool)
>>>
The output for the first point would have been True if bfp3 and bfp4 were swapped.
I wrote this function to return a array as in matlab inpolygon function. But this will return only the points that are inside the given polygon. You can't find the points in the edge of the polygon with this function.
import numpy as np
from matplotlib import path
def inpolygon(xq, yq, xv, yv):
shape = xq.shape
xq = xq.reshape(-1)
yq = yq.reshape(-1)
xv = xv.reshape(-1)
yv = yv.reshape(-1)
q = [(xq[i], yq[i]) for i in range(xq.shape[0])]
p = path.Path([(xv[i], yv[i]) for i in range(xv.shape[0])])
return p.contains_points(q).reshape(shape)
You can call the function as:
xv = np.array([0.5,0.2,1.0,0,0.8,0.5])
yv = np.array([1.0,0.1,0.7,0.7,0.1,1])
xq = np.array([0.1,0.5,0.9,0.2,0.4,0.5,0.5,0.9,0.6,0.8,0.7,0.2])
yq = np.array([0.4,0.6,0.9,0.7,0.3,0.8,0.2,0.4,0.4,0.6,0.2,0.6])
print(inpolygon(xq, yq, xv, yv))
As in the matlab documentation this function,
returns in indicating if the query points specified by xq and yq are inside or on the edge of the polygon area defined by xv and yv.