I'm trying to filter an image so that the value of each pixel is equal to the value of the median of the pixels within a 50x50 square around it, excluding any masked pixels. This is my latest attempt:
Read an image from a FITS file (looks like this...)
Apply a mask from another FITS file
Pass a 50x50 pixel window (I think this is the best way to do it...open to suggestions) across the masked image (masked image below)
Create a filtered copy of the masked image, with the value of each pixel being equal to the value of the median of the pixels within a 50x50 square around it, excluding any masked pixels
In the code here, I've used some methods from the documentation of skimage.util.view_as_windows
to produce the filtered image:
It looks to me like it's ignoring the masked pixels. My question is twofold:
Is this the best way to do it?
If so, why does it look like it's ignoring the mask?
import numpy as np
from astropy.io import fits
from skimage.util.shape import view_as_windows
# Use the fits files as input image and mask
hdulist = fits.open('xbulge-w1.fits')
image = hdulist[0].data
hdulist3 = fits.open('xbulge-mask.fits')
mask = 1 - hdulist3[0].data
imagemasked = np.ma.masked_array(image, mask = mask)
side = 50
window_shape = (side, side)
Afiltered = view_as_windows(imagemasked, window_shape)
# collapse the last two dimensions in one
flatten_view = Afiltered.reshape(Afiltered.shape[0], Afiltered.shape[1], -1)
# resampling the image by taking median
median_view = np.ma.median(flatten_view, axis=2)
Note: Using 'side = 50' results in quite a long run-time, so for testing purposes I've tended to decrease it to, say 10 to 25.
there are many filters in python with different behavior by nan's, for example for mean filter:
x=np.array([[0.1,0.8,.2],
[0.5,0.2,np.nan],
[0.7,0.2,0.9],
[0.4,0.7,1],
[np.nan,0.14,1]])
print(uniform_filter(x, size=3, mode='constant'))
[[ 0.17777778 nan nan]
[ 0.27777778 nan nan]
[ 0.3 nan nan]
[ nan nan nan]
[ nan nan nan]]
or
from skimage.filters.rank import mean
from skimage.morphology import square
from skimage import img_as_float
x=np.array([[0.1,0.8,.2],
[0.5,0.2,np.nan],
[0.7,0.2,0.9],
[0.4,0.7,1],
[np.nan,0.14,1]])
print(mean(x, square(3)))
[[102 76 76]
[106 102 97]
[114 130 127]
[ 90 142 167]
[ 79 137 181]]
print(img_as_float(mean(x, square(3))))
[[ 0.4 0.29803922 0.29803922]
[ 0.41568627 0.4 0.38039216]
[ 0.44705882 0.50980392 0.49803922]
[ 0.35294118 0.55686275 0.65490196]
[ 0.30980392 0.5372549 0.70980392]]
skimage dose not support nan's and masking: refrence
or
import numpy as np
# from scipy.signal import convolve
from scipy.signal import convolve2d
x=np.array([[0.1,0.8,.2],
[0.5,0.2,np.nan],
[0.7,0.2,0.9],
[0.4,0.7,1],
[np.nan,0.14,1]])
core = np.full((3,3),1/3**2)
# convolve(x, core, mode='same')
convolve2d(x, core, mode='same')
[[ 0.17777778 nan nan]
[ 0.27777778 nan nan]
[ 0.3 nan nan]
[ nan nan 0.43777778]
[ nan nan 0.31555556]]
Related
I have a tiff with lakes, which I converted into a 2D array. I would like to keep the outline of the lakes in a 2D array.
import rasterio
import numpy as np
with rasterio.open('myfile.tif') as dtm:
array = dtm.read(1)
array[array>0] = 1
array = array.astype(float)
array[array==0] = np.nan
My array looks like this now, a lake can be seen in the upper right corner:
[[ nan nan nan ... 2888.001 **2877.458 2867.5798**]
[ nan nan nan ... 2890.188 **2879.2876 2869.0415**]
[ nan nan nan ... 2892.2622 2880.9907 2870.4985]
...
[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]]
I wish to keep the outline of the lakes I have to set all values to nan, which are NOT located next to a nan (marked in bold).
I have tried:
array[1:-1, 1:-1] = np.nan
However, this converts ALL inner values of the entire array to nan, not just the inner values of the lakes.
If you know of a completely different way how to keep the outline of the lakes (maybe with rasterio), I would also be thankful.
I hope I made clear what I mean with inner values of the lakes.
Tim
I have a pandas dataframe called 'result' containing Longitude, Latitude and Production values. The dataframe looks like the following. For each pair of latitude and longitude there is one production value, therefore there many NaN values.
> Latitude 0.00000 32.00057 32.00078 ... 32.92114 32.98220 33.11217
Longitude ...
-104.5213 NaN NaN NaN ... NaN NaN NaN
-104.4745 NaN NaN NaN ... NaN NaN NaN
-104.4679 NaN NaN NaN ... NaN NaN NaN
-104.4678 NaN NaN NaN ... NaN NaN NaN
-104.4660 NaN NaN NaN ... NaN NaN NaN
This is my code:
plt.rcParams['figure.figsize'] = (12.0, 10.0)
plt.rcParams['font.family'] = "serif"
plt.figure(figsize=(14,7))
plt.title('Heatmap based on ANN results')
sns.heatmap(result)
The heatmap plot looks like this
but I want it to look more like this
How to adjust my code so it looks like the one on the second image?
I made a quick and dirty example of how you can smooth data in numpy array. It should be directly applicable to pandas dataframes as well.
First I present the code, then go through it:
# Some needed packages
import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.ndimage import gaussian_filter
np.random.seed(42)
# init an array with a lot of nans to imitate OP data
non_zero_entries = sparse.random(50, 60)
sparse_matrix = np.zeros(non_zero_entries.shape) + non_zero_entries
sparse_matrix[sparse_matrix == 0] = None
# set nans to 0
sparse_matrix[np.isnan(sparse_matrix)] = 0
# smooth the matrix
smoothed_matrix = gaussian_filter(sparse_matrix, sigma=5)
# Set 0s to None as they will be ignored when plotting
# smoothed_matrix[smoothed_matrix == 0] = None
sparse_matrix[sparse_matrix == 0] = None
# Plot the data
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,
sharex=False, sharey=True,
figsize=(9, 4))
ax1.matshow(sparse_matrix)
ax1.set_title("Original matrix")
ax2.matshow(smoothed_matrix)
ax2.set_title("Smoothed matrix")
plt.tight_layout()
plt.show()
The code is fairly simple. You can't smooth NaN and we have to get rid of them. I set them to zero, but depending on your field you might want to interpolate them.
Using the gaussian_filter we smooth the image, where sigma controls the width of the kernel.
The plot code yields the following images
I have a pandas dataframe with a column of continous variables. I need to convert them into 3 bins, such that first bin encompases values <20 percentile, second between 20 and 80th percentile and last is >80th percentile.
I am trying to achieve it by first getting the bin boundaries for such percentiles and then using pandas cut function. The issue is that I get an odd results, of getting only the middle bin encoded. Please see below:
test = [x for x in range(0,100)]
a = pd.DataFrame(test)
np.percentile(a, [20, 80])
Out[52]: array([ 19.8, 79.2])
pd.cut(a[0], np.percentile(a[0], [20, 80]))
...
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
20 (19.8, 79.2]
21 (19.8, 79.2]
22 (19.8, 79.2]
...
78 (19.8, 79.2]
79 (19.8, 79.2]
80 NaN
Why is that so? I though pandas cut requires you to supply boundaries of bins you want to get. Supplying 2 boundaries I supposed to get 3 bins, but seems like it doesn't work this way.
If you need 3 bins , then you need 4 break..
test = [x for x in range(0,100)]
a = pd.DataFrame(test)
np.percentile(a, [0,20, 80,100])
Out[527]: array([ 0. , 19.8, 79.2, 99. ])
pd.cut(a[0], np.percentile(a[0], [0,20, 80,100]))
Also, in pandas we have qcut , which means you do not need get the bin from numpy
pd.qcut(a[0],[0,0.2,0.8,1])
I am having some issues with a pretty simple code I have written. I have 4 sets of data, and want to generate polynomial best fit lines using numpy polyfit. 3 of the lists yield numbers when using polyfit, but the third data set yields NAN when using polyfit. Below is the code and the print out. Any ideas?
Code:
all of the 'ind_#'s are the lists of data. Below converts them into numpy arrays that can then generate polynomial best fit line
ind_1=np.array(ind_1, np.float)
dep_1=np.array(dep_1, np.float)
x_1=np.arange(min(ind_1)-1, max(ind_1)+1, .01)
ind_2=np.array(ind_2, np.float)
dep_2=np.array(dep_2, np.float)
x_2=np.arange(min(ind_2)-1, max(ind_2)+1, .01)
ind_3=np.array(ind_3, np.float)
dep_3=np.array(dep_3, np.float)
x_3=np.arange(min(ind_3)-1, max(ind_3)+1, .01)
ind_4=np.array(ind_4, np.float)
dep_4=np.array(dep_4, np.float)
x_4=np.arange(min(ind_4)-1, max(ind_4)+1, .01)
Below prints off the arrays generated above, as well as the contents of the polyfit list, which are usually the coefficients of the polynomial equation, but for the third case below, all of the polyfit contents print off as NAN
print(ind_1)
print(dep_1)
print(np.polyfit(ind_1,dep_1,2))
print(ind_2)
print(dep_2)
print(np.polyfit(ind_2,dep_2,2))
print(ind_3)
print(dep_3)
print(np.polyfit(ind_3,dep_3,2))
print(ind_4)
print(dep_4)
print(np.polyfit(ind_4,dep_4,2))
Print out:
[ 1.405 1.871 2.713 ..., 5.367 5.404 2.155]
[ 0.274 0.07 0.043 ..., 0.607 0.614 0.152]
[ 0.01391925 -0.00950728 0.14803846]
[ 0.9760001 2.067 8.8 ..., 1.301 1.625 2.007 ]
[ 0.219 0.05 0.9810001 ..., 0.163 0.161 0.163 ]
[ 0.00886807 -0.00868727 0.17793324]
[ 1.143 0.9120001 2.162 ..., 2.915 2.865 2.739 ]
[ 0.283 0.3 0.27 ..., 0.227 0.213 0.161]
[ nan nan nan]
[ 0.167 0.315 1.938 ..., 2.641 1.799 2.719]
[ 0.6810001 0.7140001 0.309 ..., 0.283 0.313 0.251 ]
[ 0.00382331 0.00222269 0.16940372]
Why are the polyfit constants from the third case listed as NAN? All the data sets have same type of data, and all of the code is consistent. Please help.
Just looked at your data. This is happening because you have a NaN in dep_3 (element 713). You can make sure that you only use finite values in the fit like this:
idx = np.isfinite(ind_3) & np.isfinite(dep_3)
print(np.polyfit(ind_3[idx], dep_3[idx], 2))
As for finding for bad values in large datasets, numpy makes that really easy. You can find the indices like this:
print(np.where(~np.isfinite(dep_3)))
All -
I am trying to use SciPy's signal.lfilter function to filter a vector of samples - unfortunately, all that is returned is a vector of NaN.
I have plotted the frequency response of the filter, and the filter coefficients look correct; I'm fairly certain the issue is with the actual call to lfilter.
It's a high-pass Chebychev I filter, which I'm creating with:
b,a = signal.iirdesign(wp = 0.11, ws= 0.1, gstop= 60, gpass=1, ftype='cheby1')
I am then filtering data with:
filtered_data = signal.lfilter(b, a, data)
Below, I am printing a selection of 20 samples from the pre-filtered data, and then the filtered data. You can clearly see the issue:
### Printing a small selection of the data before it is filtered:
((-0.003070347011089325+0.0073614344000816345j), (-0.003162827342748642+0.007342938333749771j), (-0.003310795873403549+0.0073614344000816345j), (-0.0031813234090805054+0.007342938333749771j), (-0.003255307674407959+0.007398426532745361j), (-0.003162827342748642+0.007287450134754181j), (-0.003125835210084915+0.007509402930736542j), (-0.003162827342748642+0.007342938333749771j), (-0.0031073391437530518+0.007287450134754181j), (-0.0032368116080760956+0.007398426532745361j), (-0.0030888430774211884+0.007342938333749771j))
### Printing a small selection of the filtered data:
[ nan nanj nan nanj nan nanj nan nanj nan nanj nan nanj nan nanj
nan nanj nan nanj nan nanj nan nanj nan nanj nan nanj nan nanj
nan nanj nan nanj nan nanj nan nanj nan nanj nan nanj]
Like I said before, the coefficients of the filter look good. They are:
b = [ 4.06886235e-02 -7.73083846e-01 6.95775461e+00 -3.94272761e+01
1.57709105e+02 -4.73127314e+02 1.10396373e+03 -2.05021836e+03
3.07532754e+03 -3.75873366e+03 3.75873366e+03 -3.07532754e+03
2.05021836e+03 -1.10396373e+03 4.73127314e+02 -1.57709105e+02
3.94272761e+01 -6.95775461e+00 7.73083846e-01 -4.06886235e-02]
a = [ 1.00000000e+00 -1.27730099e+01 7.81201390e+01 -3.03738394e+02
8.40827723e+02 -1.75902089e+03 2.88045462e+03 -3.77173152e+03
3.99609428e+03 -3.43732844e+03 2.38415171e+03 -1.30118368e+03
5.21654119e+02 -1.18026566e+02 -1.85597824e+01 3.24205235e+01
-1.65545917e+01 5.02665439e+00 -9.09697811e-01 7.68172820e-02]
So why would lfilter return only NaN? How am I using this function incorrectly?
Thanks in advance for your help!
Edit:
Okay, I solved it.
For anyone that encounters this in the future:
For whatever reason, even though the returned coefficients for the filter looked good, when I then used those coefficients in SciPy's lfilter function, the filtered values were unbounded. Simply changing the passband edge to ANY number other than 0.11 fixed the problem. Even this works:
b,a = signal.iirdesign(wp = 0.119, ws= 0.1, gstop= 60, gpass=1, ftype='cheby1')
Other than manually grepping through the poles and zeros of the filter, I'm not sure how you would detect instability of the filter. Bizarre.
An IIR filter is stable if the absolute values of the roots of the denominator of the discrete transfer function a(z) are all less than one. So, you can detect the instability by following code:
from scipy import signal
import numpy as np
b1, a1 = signal.iirdesign(wp = 0.11, ws= 0.1, gstop= 60, gpass=1, ftype='cheby1')
b2, a2 = signal.iirdesign(wp = 0.119, ws= 0.1, gstop= 60, gpass=1, ftype='cheby1')
print "filter1", np.all(np.abs(np.roots(a1))<1)
print "filter2", np.all(np.abs(np.roots(a2))<1)