checking for nan's in 2d numpy arrays

checking for nan's in 2d numpy arrays - python

I am working on a small piece of code that starts with an interpolated surface I made previsouly. The interpolation filled in gaps in the surface with nan's. Part of my processing involves looking at a local window around a particular point, and calculating some measures using the local surface. I would ideally like this code to only be able to do any calculations if the entire local surface does not contain nan values. The code iterates through the original large surface and checks to see if the local window about a point has a nan.
I know this is not the most efficent way to go about doing it, time-efficiency is not something I have to worry about.
Here is what I have so far:
for in in range(startz,endx):
imin = i - half_tile
imax = i + half_tile +1
for j in range(starty,endy):
jmin = i - half_tile
jmax = i + half_tile +1
#Test the local surface for nan's
z = surface[imin:imax,jmin:jmax]
Test = np.isnan(sum(z))
#conditional statement
if Test:
print 'We have a nan'
#set measures I want to calculate to zero
else:
print 'We have a complete window'
#do a set of calculations
the variable surface is the interpolated surface I created originally. The half_tile variables are just defining the size of the local window I want to use. The startx,endx,starty,endy are defining the size of the original surface to iterate through.
Where I am running into issues is that my conditional statement doesn't seem to be working. It will tell me that the local window that I am evaluating doesn't have any nan's in it, but then the rest of my code (which I didn't show here) will not work because it says there are nan's in the array.
An example of this might be:
[[ 7.07494104 7.04592032 7.01689961 6.98787889 6.95885817 6.92983745
6.90081674 6.87179602 6.8427753 6.81375458 6.78473387 6.75571315
6.72669243]
[ 7.10077447 7.07175376 7.04273304 7.01371232 6.98469161 6.95567089
6.92665017 6.89762945 6.86860874 6.83958802 6.8105673 6.78154658
6.75252587]
[ 7.12660791 7.09758719 7.06856647 7.03954576 7.01052504 6.98150432
6.9524836 6.92346289 6.89444217 6.86542145 6.83640073 6.80738002
6.7783593 ]
[ 7.15244134 7.12342063 7.09439991 7.06537919 7.03635847 7.00733776
6.97831704 6.94929632 6.9202148 6.89105825 6.86190169 6.83274514
6.80358859]
[ 7.17804068 7.14888413 7.11972758 7.09057103 7.06141448 7.03225793
7.00310137 6.97394482 6.94478827 6.91563172 6.88647517 6.85731862
nan]]
Here is an example of the local window that my code is evaluating. In my code this would be z. The entire array has good values except for the last value which is a nan.
The "checking" function in my code is not picking up that there is a nan in the array. The conditional statement is returning a false when it should be a true to indicate that there is a nan present. I am missing anything fundamental in the way I am checking the array? or are my methods just totally wrong?

isnan() returns an array with true or false for each element in array. you need np.any() in addition to isnan(). see below example
import numpy as np
a = np.array([[1,2,3,4],[1,2,3,np.NaN]])
print np.isnan(a)
print np.any(np.isnan(a))
results in
[[False False False False]
[False False False True]]
True

Related

Efficiently assign a value within predefined range in Numpy

The objective is to assign new value within certain range (b_top,b_low).
The code below able to achieve the intended objective
b_top=np.array([1,7])
b_low=np.array([3,9])+1
Mask=np.zeros((1,11), dtype=bool)
for x,y in zip(b_top,b_low):
Mask[0,x:y]=True
However, I wonder there is single line approach, or more efficient way of doing this?

You can turn b_top and b_low into a mask using np.cumsum and the fact that bool and int8 are the same itemsize.
header = np.zeros(M.shape[1], np.uint8)
header[b_top] = 1
header[b_low if b_low[-1] < header.size else b_low[:-1]] = -1
header.cumsum(out=Mask[0].view(np.int8))
I've implemented this function in a little utility library I made. The function is called haggis.math.runs2mask. You would call it as
from haggis.math import runs2mask
Mask[0] = runs2mask(np.stack((b_top, b_low), -1), Mask.shape[1])

What is the equivalent way of doing this type of pythonic vectorized assignment in MATLAB?

I'm trying to translate this line of code from Python to MATLAB:
new_img[M[0, :] - corners[0][0], M[1, :] - corners[1][0], :] = img[T[0, :], T[1, :], :]
So, naturally, I wrote something like this:
new_img(M(1,:)-corners(2,1),M(2,:)-corners(2,2),:) = img(T(1,:),T(2,:),:);
But it gives me the following error when it reaches that line:
Requested 106275x106275x3 (252.4GB) array exceeds maximum array size
preference. Creation of arrays greater than this limit may take a long
time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
This has made me believe that it is not assigning things correctly. Img is at most a 1000 × 1500 RGB image. The same code works in less than 5 seconds in Python. How can I do vector assignment like the code in the first line in MATLAB?
By the way, I didn't paste all lines of my code for this post not to get too long. If I need to add anything else, please let me know.
Edit:
Here's an explanation of what I want my code to do (basically, this is what the Python code does):
Consider this line of code. It's not a real MATLAB code, I'm just trying to explain what I want to do:
A([2 3 5], [1 3 5]) = B([1 2 3], [2 4 6])
It is interpreted like this:
A(2,1) = B(1,2)
A(3,1) = B(2,2)
A(5,1) = B(3,2)
A(2,3) = B(1,4)
A(3,3) = B(2,4)
A(5,3) = B(3,4)
...
...
...
Instead, I want it to be interpreted like this:
A(2,1) = B(1,2)
A(3,3) = B(2,4)
A(5,5) = B(3,6)

When you do A[vector1, vector2] in Python, you index the set:
A[vector1[0], vector2[0]]
A[vector1[1], vector2[1]]
A[vector1[2], vector2[2]]
A[vector1[3], vector2[3]]
...
In MATLAB, the similar-looking A(vector1, vector2) instead indexes the set:
A(vector1(1), vector2(1))
A(vector1(1), vector2(2))
A(vector1(1), vector2(3))
A(vector1(1), vector2(4))
...
A(vector1(2), vector2(1))
A(vector1(2), vector2(2))
A(vector1(2), vector2(3))
A(vector1(2), vector2(4))
...
That is, you get each combination of indices. You should think of it as a sub-array composed of the rows and columns specified in the two vectors.
To accomplish the same as the Python code, you need to use linear indexing:
index = sub2ind(size(A), vector1, vector2);
A(index)
Thus, your MATLAB code should do:
index1 = sub2ind(size(new_img), M(1,:)-corners(2,1), M(2,:)-corners(2,2));
index2 = sub2ind(size(img), T(1,:), T(2,:));
% these indices are for first 2 dims only, need to index in 3rd dim also:
offset1 = size(new_img,1) * size(new_img,2);
offset2 = size(img,1) * size(img,2);
index1 = index1.' + offset1 * (0:size(new_img,3)-1);
index2 = index2.' + offset2 * (0:size(new_img,3)-1);
new_img(index1) = img(index2);
What the middle block does here is add linear indexes for the same elements along the 3rd dimension. If ii is the linear index to an element in the first channel, then ii + offset1 is an index to the same element in the second channel, and ii + 2*offset1 is an index to the same element in the third channel, etc. So here we're generating indices to all those matrix elements. The + operation is doing implicit singleton expansion (what they call "broadcasting" in Python). If you have an older version of MATLAB this will fail, you need to replace that A+B with bsxfun(#plus,A,B).

Save all values of a variable (in a loop) in another variable in Python

I have a code that I inform a folder, where it has n images that the code should return me the relative frequency histogram.
From there I have a function call:
for image in total_images:
histogram(image)
Where image is the current image that the code is working on and total_images is the total of images (n) it has in the previously informed folder.
And from there I call the histogram() function, sending as a parameter the current image that the code is working.
My histogram() function has the purpose of returning the histogram of the relative frequency of each image (rel_freq).
Although the returned values are correct, rel_freq should be a array 1x256 positions ranging from 0 to 255.
How can I transform the rel_freq variable into a 1x256 array? And each value stored in its corresponding position?
When I do len *rel_freq) it returns me 256, that's when I realized that it is not in the format I need...
Again, although the returned data is correct...
After that, I need to create an array store_all = len(total_images)x256 to save all rel_freq...
I need to save all rel_freq in an array to later save it and to an external file, such as .txt.
I'm thinking of creating another function to do this...
Something like that, but I do not know how to do it correctly, but I believe you will understand the logic...
def store_all_histograms(total_images):
n = len(total_images)
store_all = [n][256]
for i in range(0,n):
store_all[i] = rel_freq
I know the function store_all_histograms() is wrong, I just wrote it here to show more or less the way I'm thinking of doing... but again, I do not know how to do it properly... At this point, the error I get is:
store_all = [n][256]
IndexError: list index out of range
After all, I need the store_all variable to save all relative frequency histograms for example like this:
position: 0 ... 256
store_all = [
[..., ..., ...],
[..., ..., ...],
.
.
.
n
]
Now follow this block of code:
def histogram(path):
global rel_freq
#Part of the code that is not relevant to the question...
rel_freq = [(float(item) / total_size) * 100 if item else 0 for item in abs_freq]
def store_all_histograms(total_images):
n = len(total_images)
store_all = [n][256]
for i in range(0,n):
store_all[i] = rel_freq
#Part of the code that is not relevant to the question...
# Call the functions
for fn in total_images:
histogram(fn)
store_all_histograms(total_images)
I hope I have managed to be clear with the question.
Thanks in advance, if you need any additional information, you can ask me...

Return the result, don't use a global variable:
def histogram(path):
return [(float(item) / total_size) * 100 if item else 0 for item in abs_freq]
Create an empty list:
store_all = []
and append your results:
for fn in total_images:
store_all.append(histogram(fn))
Alternatively, use a list comprehension:
store_all = [histogram(fn) for fn in total_images]

for i in range(0,n):
store_all[i+1] = rel_freq
Try this perhaps? I'm a bit confused on the question though if I'm honest. Are you trying to shift the way you call the array with all the items by 1 so that instead of calling position 1 by list[0] you call it via list[1]?
So you want it to act like this?
>>list = [0,1,2,3,4]
>>list[1]
0

Using astropy.fits and numpy to apply coincidence corrections to SWIFT fits image

This question may be a little specialist, but hopefully someone might be able to help. I normally use IDL, but for developing a pipeline I'm looking to use python to improve running times.
My fits file handling setup is as follows:
import numpy as numpy
from astropy.io import fits
#Directory: /Users/UCL_Astronomy/Documents/UCL/PHASG199/M33_UVOT_sum/UVOTIMSUM/M33_sum_epoch1_um2_norm.img
with fits.open('...') as ima_norm_um2:
#Open UVOTIMSUM file once and close it after extracting the relevant values:
ima_norm_um2_hdr = ima_norm_um2[0].header
ima_norm_um2_data = ima_norm_um2[0].data
#Individual dimensions for number of x pixels and number of y pixels:
nxpix_um2_ext1 = ima_norm_um2_hdr['NAXIS1']
nypix_um2_ext1 = ima_norm_um2_hdr['NAXIS2']
#Compute the size of the images (you can also do this manually rather than calling these keywords from the header):
#Call the header and data from the UVOTIMSUM file with the relevant keyword extensions:
corrfact_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
coincorr_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
#Check that the dimensions are all the same:
print(corrfact_um2_ext1.shape)
print(coincorr_um2_ext1.shape)
print(ima_norm_um2_data.shape)
# Make a new image file to save the correction factors:
hdu_corrfact = fits.PrimaryHDU(corrfact_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_corrfact]).writeto('.../M33_sum_epoch1_um2_corrfact.img')
# Make a new image file to save the corrected image to:
hdu_coincorr = fits.PrimaryHDU(coincorr_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_coincorr]).writeto('.../M33_sum_epoch1_um2_coincorr.img')
I'm looking to then apply the following corrections:
# Define the variables from Poole et al. (2008) "Photometric calibration of the Swift ultraviolet/optical telescope":
alpha = 0.9842000
ft = 0.0110329
a1 = 0.0658568
a2 = -0.0907142
a3 = 0.0285951
a4 = 0.0308063
for i in range(nxpix_um2_ext1 - 1): #do begin
for j in range(nypix_um2_ext1 - 1): #do begin
if (numpy.less_equal(i, 4) | numpy.greater_equal(i, nxpix_um2_ext1-4) | numpy.less_equal(j, 4) | numpy.greater_equal(j, nxpix_um2_ext1-4)): #then begin
#UVM2
corrfact_um2_ext1[i,j] == 0
coincorr_um2_ext1[i,j] == 0
else:
xpixmin = i-4
xpixmax = i+4
ypixmin = j-4
ypixmax = j+4
#UVM2
ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax])
xvec_UVM2 = ft*ima_UVM2sum
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2*xvec_UVM2) + (a3*xvec_UVM2*xvec_UVM2*xvec_UVM2) + (a4*xvec_UVM2*xvec_UVM2*xvec_UVM2*xvec_UVM2)
Ctheory_UVM2 = - alog(1-(alpha*ima_UVM2sum*ft))/(alpha*ft)
corrfact_um2_ext1[i,j] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum)
coincorr_um2_ext1[i,j] = corrfact_um2_ext1[i,j]*ima_sk_um2[i,j]
The above snippet is where it is messing up, as I have a mixture of IDL syntax and python syntax. I'm just not sure how to convert certain aspects of IDL to python. For example, the ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax]) I'm not quite sure how to handle.
I'm also missing the part where it will update the correction factor and coincidence correction image files, I would say. If anyone could have the patience to go over it with a fine tooth comb and suggest the neccessary changes I need that would be excellent.
The original normalised image can be downloaded here: Replace ... in above code with this file

One very important thing about numpy is that it does every mathematical or comparison function on an element-basis. So you probably don't need to loop through the arrays.
So maybe start where you convolve your image with a sum-filter. This can be done for 2D images by astropy.convolution.convolve or scipy.ndimage.filters.uniform_filter
I'm not sure what you want but I think you want a 9x9 sum-filter that would be realized by
from scipy.ndimage.filters import uniform_filter
ima_UVM2sum = uniform_filter(ima_norm_um2_data, size=9)
since you want to discard any pixel that are at the borders (4 pixel) you can simply slice them away:
ima_UVM2sum_valid = ima_UVM2sum[4:-4,4:-4]
This ignores the first and last 4 rows and the first and last 4 columns (last is realized by making the stop value negative)
now you want to calculate the corrections:
xvec_UVM2 = ft*ima_UVM2sum_valid
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2**2) + (a3*xvec_UVM2**3) + (a4*xvec_UVM2**4)
Ctheory_UVM2 = - np.alog(1-(alpha*ima_UVM2sum_valid*ft))/(alpha*ft)
these are all arrays so you still do not need to loop.
But then you want to fill your two images. Be careful because the correction is smaller (we inored the first and last rows/columns) so you have to take the same region in the correction images:
corrfact_um2_ext1[4:-4,4:-4] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum_valid)
coincorr_um2_ext1[4:-4,4:-4] = corrfact_um2_ext1[4:-4,4:-4] *ima_sk_um2
still no loop just using numpys mathematical functions. This means it is much faster (MUCH FASTER!) and does the same.
Maybe I have forgotten some slicing and that would yield a Not broadcastable error if so please report back.
Just a note about your loop: Python's first axis is the second axis in FITS and the second axis is the first FITS axis. So if you need to loop over the axis bear that in mind so you don't end up with IndexErrors or unexpected results.

Python array is getting changed

My function takes the points a polyline and removes the multiple points along any straight line segment.
The points fed in are as follows:
pts=[['639.625', '-180.719'], ['629.625', '-180.719'], ['619.625', '-180.719'], ['617.312', '-180.719'], ['610.867', '-182.001'], ['605.402', '-185.652'], ['601.751', '-191.117'], ['600.469', '-197.562'], ['600.469', '-207.562'], ['600.469', '-208.273']]
pta=[None]*2
ptb=[None]*2
ptc=[None]*2
simplepts=[]
for pt in pts:
if pta[0]==None:
simplepts.append(pt)
pta[:]=pt
continue
if ptb[0]==None:
ptb[:]=pt
continue
if ptb==pta:
ptb[:]=pt
continue
ptc[:]=pt
print simplepts#<--[['639.625', '-180.719'], ['605.402', '-185.652']]
# we check if a, b and c are on a straight line
# if they are, then b becomes c and the next point is allocated to c.
# if the are not, then a becomes b and the next point is allocate to c
if testforStraightline(pta,ptb,ptc):
ptb[:]=ptc # if it is straight
else:
simplepts.append(ptb)
print simplepts#<--[['639.625', '-180.719'], ['617.312', '-180.719']]
pta[:]=ptb # if it's not straight
If the section is not straight, then the ptb is appended to the simplepts array, which is now (correctly) [['639.625', '-180.719'], ['617.312', '-180.719']]
However, on the next pass the simplepts array has changed to [['639.625', '-180.719'], ['605.402', '-185.652']] which is baffling.
I presume that the points in my array are being held by reference only and changing other values updates the values in the array.
How do I make sure that my array values retain the values as they are assigned?
Thank you.

You are appending a list ptb in simplepts and then you are modifying it in place.Not sure if you can improve your design. But a quick solution with current design-
import copy
simplepts.append(copy.deepcopy(ptb))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

checking for nan's in 2d numpy arrays - python

Related

Efficiently assign a value within predefined range in Numpy

What is the equivalent way of doing this type of pythonic vectorized assignment in MATLAB?

Save all values of a variable (in a loop) in another variable in Python

Using astropy.fits and numpy to apply coincidence corrections to SWIFT fits image

Python array is getting changed

Categories

Resources