vectorized interpolation on array with nans

vectorized interpolation on array with nans - python

I am trying to interpolate an image cube NDIM=(dim_frequ, dim_spaxel1, dim_spaxel1) along the frequency axis. The aim is to oversample the frequency space. The array may contain nans. It would, of course, be possible to run two for loops over the array but that's definitely too slow.
What I want in pseudo code:
import numpy as np
from scipy.interpolate import interp1d
dim_frequ, dim_spaxel1, dim_spaxel2 = 2559, 70, 70
cube = np.random.rand(dim_frequ, dim_spaxel1, dim_spaxel2)
cube.ravel()[np.random.choice(cube.size, 1000, replace=False)] = np.nan
wavelength = np.arange(1.31, 2.5894999999, 5e-4) # deltaf so that len(wavelength)==DIMfrequ
wavelength_over = np.arange(1.31, 2.5894999999, 5e-5)
cube_over = interp1d(wavelength, cube, axis=0, kind='quadratic', fill_value="extrapolate")(wavelength_over)
cube_over[np.isnan(cube_over)] # array([], dtype=float64)
I've tried np.interp which can only handle 1D data (?)
I've tried scipy.interpolate.interp1d which can in principle handle
arrays along a given axis, but returns nans (I assume because of the
nans in the array)
This actually works in the case the kind is = 'linear'. I'd actually like it a bit fancier though, as soon as I set kind to 'quadratic' it returns nans.
I've tried the scipy.interpolate.CubicSpline
which raises a ValueError again because of the nans.
Any ideas what else to try? I am quite free in terms of the type of the interpolation, but it shouldn't be too fancy, i.e. nothing crazier than a spline or a low order polynomial

So a couple of things.
First
This returns all nan because cube_over has no nan in it after the above
cube_over[np.isnan(cube_over)]
Since np.isnan(cube_over) is all False
Otherwise it appears to be interpolating everything in the wavelength_over array.
Second
scipy doesn't like nans (see the docs) Typical practice is to drop the nan's from your set of points to interpolate since it typically will not add any value to the interpolation function.
Although it appears to be working with you interp1d example above. I am guessing it is dropped them along the axis when it builds the interpolation function, but I am not sure.
Third
What value do you actually want to interpolate? I am not sure what your desired output / endpoint is. It appears that your code is working more or less as expected. When you are interpolating you wavelength_over array. Seeing as they are so similar (if not the same value as the wavelength array. I think you might benefit from a 2d interpolation method but again I do not have a good understanding of your goal.
See 2d interpolation options in scipy docs
Hope this helps.

Related

How to resize an arbitrary Numpy NDArray to a new shape using interpolation

Remark:
This is rather a contribution than a question since I will answer my own question.
However, I am still interested in how the community would solve this problem.
So feel free to answer.
Story:
So when I was playing around with QT in Python (i.e., PySide6) and it's Volumerendering capabilities I noticed some problems when setting my data array. Long story short: I didn't know (and if it is stated somwhere in the QT documentation at all) that the provided texture has to be of a shape where each dimension is a power of two.
Thus, I wanted to rescale my array to a shape which fulfills this criteria.
Calculating this shape with numpy is easy:
new_shape = numpy.power(2, numpy.ceil(numpy.log2(old_shape))).astype(int)
Now the only problem left is to rescale my array with shape old_shape to the new array with shape new_shape and properly interpolate the values.
And since I am usually only interested in some sort of generic approaches (who knows what this might be good for and for whom in the future), the following question did arise:
Question
How to resize an arbitrary Numpy NDArray of shape old_shape to a Numpy NDArray of shape new shape with proper interpolation?
I tried using scipy RegularGridInterpolator to rescale my array and it actually worked.

I used scipy's RegularGridInterpolator to interpolate my array.
Other interpolators should work as well.
def resample_array_to_shape(array: np.array, new_shape, method="linear"):
# generate points for each entry in the array
entries = [np.arange(s) for s in array.shape]
# the value for each point corresponds to its value in the original array
interp = RegularGridInterpolator(entries, array, method=method)
# new entries
new_entries = [np.linspace(0, array.shape[i] - 1, new_shape[i]) for i in range(len(array.shape))]
# use 'ij' indexing to avoid swapping axes
new_grid = np.meshgrid(*new_entries, indexing='ij')
# interpolate and return
return interp(tuple(new_grid)).astype(array.dtype)

Edge of a curve based out of numpy array

I am looking for some mathematical guidance, to help me find the index locations (red circles) of a curve as shown in the image below. The curve is just 1D numpy array. I tried scipy - gaussianfilter1d. I also tried np.gradient and I am not anywhere close to what I want to do. The gradient is abruptly changing, so a first order gradient should give what I am looking for. Then I realized the data is not smooth, and I tried smoothing by 'gaussianfilter1d'. Even then, I am unable to pick up where it changes. I have various types of these numpy arrays (same size, values ranging from 0 - 1), so the solution has to be applicable, and not dependent on the given data set. So I could not hardcode. Any thoughts would be much appreciated.
CSV file

First you get a smooth function from your data using scipy's UnivariateSpline. Then you plot the area where the absolute slope is say at least 1/4 of it's maximum.
from scipy.interpolate import UnivariateSpline
f= UnivariateSpline(np.arange(5500), y, k=3, s=0.3)
df = f.derivative()
plt.plot(x,f(x))
cond = np.abs(df(x)) > 0.25*np.max(np.abs(df(x)))
plt.scatter(x[cond],f(x[cond]), c='r')
Looks like what you are looking for is the first and last point of the marked ones. So you do
(x[cond].min(),f(x[cond].min()).item()), (x[cond].max(), f(x[cond].max()).item())
And your points are:
((1455, 0.20595740349084446), (4230, 0.1722999962943679))

Vectorize finding center of sets of points in multidimensional array in Numpy

I've got a multidimensional array that has 1 million sets of 3 points, each point being a coordinate specified by x and y. Calling this array pointVec, what I mean is
np.shape(pointVec) = (1000000,3,2)
I want to find the center of each of the set of 3 points. One obvious way is to iterate through all 1 million sets, finding the center of each set at each iteration. However, I have heard that vectorization is a strong-suit of Numpy's, so I'm trying to adapt it to this problem. Since this problem fits so intuitively with iteration, I don't have a grasp of how one might do it with vectorization, or if using vectorization would even be useful.

It depends how you define a center of a three-point. However, if it is average coordinates, like #Quang mentioned in the comments, you can take the average along a specific axis in numpy:
pointVec.mean(1)
This will take the mean along axis=1 (which is second axis with 3 points) and return a (1000000,2) shaped array.

Question about numpy correlate: not giving expected result

I want to make sure I am using numpy's correlate correctly, it is not giving me the answer I expect. Perhaps I am misunderstanding the correlate function. Here is a code snipet with comments:
import numpy as np
ref = np.sin(np.linspace(-2*np.pi, 2*np.pi, 10000)) # make some data
fragment = ref[2149:7022] # create a fragment of data from ref
corr = np.correlate(ref, fragment) # Find the correlation between the two
maxLag = np.argmax(corr) # find the maximum lag, this should be the offset that we chose above, 2149
print(maxLag)
2167 # I expected this to be 2149.
Isn't the index in the corr array where the correlation is maximum the lag between these two datasets? I would think the starting index I chose for the smaller dataset would be the offset with the greatest correlation.
Why is there a discrepancy between what I expect, 2149, and the result, 2167?
Thanks

That looks like a precision error to me, cross-correlation is an integral and it will always have problems when being represented in discrete space, I guess the problem arises when the values are close to 0. Maybe if you increase the numbers or increase the precision that difference will disappear but I don't think it is really necessary since you are already dealing with approximation when using the discrete cross-correlation, below is the graph of the correlation for you te see that the values are indeed close:

scipy interp1d extrapolation method

I am trying to extrapolate values from some endpoints as shown in the image below
extrapolated value illustration
I have tried using the scipy interp1d method as shown below
from scipy import interpolate
x = [1,2,3,4]
y = [0,1,2,0]
f = interpolate.interp1d(x,y,fill_value='extrapolate')
print(f(4.3))
output : -0.5999999999999996
Though this is correct, I also need a second extrapolated value which is the intersection of X on segment i=1.The estimated value i am expecting is ~ 3.3 as seen from the graph in the image above.But I need get this programmatically,I am hoping there should be a way of returning multiple values from interp1d(.....) or something. Any help will be much appreciated.Thanks in advance

If you want to extrapolate based all but the last pair of values, you can just build a second interpolator, using x[:-1], y[:-1])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

vectorized interpolation on array with nans - python

Related

How to resize an arbitrary Numpy NDArray to a new shape using interpolation

Edge of a curve based out of numpy array

Vectorize finding center of sets of points in multidimensional array in Numpy

Question about numpy correlate: not giving expected result

scipy interp1d extrapolation method

Categories

Resources