Why does Matlab interp1 produce different results than numpy interp? - python

EDIT: Code edited to produce results consistent with Matlab. See below.
I am converting Matlab scripts to Python and the linear interpolation results are different in certain cases. I wonder why and if there is any way to fix this?
Here is the code example in both Matlab and Python and the resulting output (Note that t just so happens to be equal to tin in this case):
MATLAB:
t= [ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
tin =[ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
xin = [ nan , 1392., 1406. , 1418. , nan , 1442. , nan];
interp1(tin,xin,t)
ans =
NaN 1392 1406 1418 NaN 1442 NaN
Python (numpy):
(scipy interpolate.interp1d produces the same result as numpy)
t= [ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
tin =[ 736696., 736696.00208333, 736696.00416667, 736696.00625, 736696.00833333, 736696.01041667, 736696.0125];
xin = [ nan , 1392., 1406. , 1418. , nan , 1442. , nan];
x = np.interp(t,tin,xin)
array([ nan, 1392., 1406., nan, nan, nan, nan])
# Edit
# Find indices where t == tin and if the np.interp output
# does not match the xin array, overwrite the np.interp output at those
# indices
same = np.where(t == tin)[0]
not_same = np.where(xin[same] != x[same])[0]
x[not_same] = xin[not_same]

It appears as if Matlab includes an additional equality check in it's interpolation.
Linear 1-D interpolation is generally done by finding two x values which span the input value x and then calculating the result as:
y = y1 + (y2-y1)*(x-x1)/(x2-x1)
If you pass in an x value which is exactly equal to one of the input x coordinates, the routine will generally calculate the correct value since x-x1 will be zero. However, if your input array has a nan as y1 or y2 these will propagate to the result.
Based on the code you posted, my best guess would be that Matlab's interpolation function has an additional check that is something like:
if x == x1:
return y1
and that the numpy function does not have this check.
To achieve the same effect in numpy you could do:
np.where(t == tin,xin,np.interp(t,tin,xin))

Related

Min/max scaling with additional points

I'm trying to normalize an array within a range, e.g. [10,100]
But I also want to manually specify additional points in my result array, for example:
num = [1,2,3,4,5,6,7,8]
num_expected = [min(num), 5, max(num)]
expected_range = [10, 20, 100]
result_array = normalize(num, num_expected, expected_range)
Intended results:
Values from 1-5 are normalized to range (10,20].
5 in num array is mapped to 20 in expected range.
Values from 6-8 are normalized to range (20,100].
I know I can do it by normalizing the array twice, but I might have many additional points to add. I was wondering if there's any built-in function in numpy or scipy to do this?
I've checked MinMaxScaler in sklearn, but did not find the functionality I want.
Thanks!
Linear interpolation will do exactly what you want:
import scipy.interpolate
interp = scipy.interpolate.interp1d(num_expected, expected_range)
Then just pass numbers or arrays of numbers that you want to interpolate:
In [20]: interp(range(1, 9))
Out[20]:
array([ 10. , 12.5 , 15. , 17.5 ,
20. , 46.66666667, 73.33333333, 100. ])

optimize function that reads and mirrors a half numpy matrix

I have a text file that has some values of a matrix, but it just has half of the values of it, like this:
1. 1. 0.01
2. 1. 0.052145
2. 2. 0.045
3. 1. 0.054521
3. 2. 0.05424
3. 3. 0.05459898
the first two columns are referent to matrix (x,y) position, and the last one, the value it has. the first two values might be, actually, value-1.
I made a function that reads the file and mirrors these values to a full matrix:
def expand_mirror_matrix(matrix_path='data.txt'):
data = np.loadtxt(matrix_path)
shape = (int(data[-1][0]), int(data[-1][1]))
m = np.zeros(shape)
for d in data:
x, y, z = int(d[0]), int(d[1]), d[2]
m[x-1,y-1] = z
m[shape[0]-x,shape[1]-y]=z
return m
But it has some unnecessary loops, like the first and the last, and the loop that changes the value of the center of the matrix.
Is there a way of optimizing it? This file actually have thousands of lines, it might be great to downgrade this loop execution time.
I believe this does what you want, at least without the mirroring:
def expand_mirror_matrix(matrix_path='data.txt'):
data = np.loadtxt(matrix_path)
shape = (int(data[-1][0]), int(data[-1][1]))
xs = data[:,0].astype(int) - 1 # Numpy uses zero-based indexing.
ys = data[:,1].astype(int) - 1
m = np.zeros(shape)
m[(xs, ys)] = data[:,2]
return m
For your example file above this returns:
array([[0.01 , 0. , 0. ],
[0.052145 , 0.045 , 0. ],
[0.054521 , 0.05424 , 0.05459898]])
If you wish to mirror it you probably want to edit the above function with the following:
m[(xs, ys)] = data[:,2]
m[(ys, xs)] = data[:,2] # Mirrored.
The result of that is:
array([[0.01 , 0.052145 , 0.054521 ],
[0.052145 , 0.045 , 0.05424 ],
[0.054521 , 0.05424 , 0.05459898]])
Note that this assumes the matrix is square.

pearson correlation using np.random.rand failing

I have the following code to calculate the correlation coefficient using two different ways to generate number series. It fails to work for the first way (corr_coeff_pearson) but works for the 2nd way (corr_coeff_pearson_1). Why is this so? In both cases, the variables are of class 'numpy.ndarray'
import numpy as np
np.random.seed(1000)
inp_vct_lngt = 5
X = 2*np.random.rand(inp_vct_lngt,1)
y=4+3*X+np.random.randn(inp_vct_lngt,1)
print(type(X))
corr_coeff_pearson=0
corr_coeff_pearson = np.corrcoef(X,y)
print("Pearson Correlation:")
print(corr_coeff_pearson)
X_1 = np.random.randint(0,50,5)
y_1 = X_1 + np.random.normal(0,10,5)
print(type(X_1))
corr_coeff_pearson_1 = np.corrcoef(X_1,y_1)
print("Pearson Correlation:")
print(corr_coeff_pearson_1)
Is there some way to "convert" the number in the first way of generating the series that I am missing?
The issue is that X and y are 2 dimensional:
>>> X
array([[1.9330627 ],
[0.19204405],
[0.21168505],
[0.65018234],
[0.83079548]])
>>> y
array([[8.60619212],
[6.09210226],
[5.33097283],
[5.71649684],
[5.18771916]])
So corrcoef is thinking
Each row of x represents a variable, and each column a single observation of all those variables
(quoted from the docs)
What you can do is either flatten the two to one dimension:
>>> np.corrcoef(X.flatten(),y.flatten())
array([[1. , 0.84196446],
[0.84196446, 1. ]])
Or use rowvar=False:
>>> np.corrcoef(X,y,rowvar=False)
array([[1. , 0.84196446],
[0.84196446, 1. ]])

Create a mask both for nan and inf values in an array

I have to remove both nan and inf values from two arrays.
I found this post useful https://stackoverflow.com/a/48591908/7541421 for removing nan. Is there any similar solution when I can create a mask to remove both nan and inf values?
The example below is just illustrative, I have arrays of large dimensions (400 elements)
import numpy as np
from numpy import nan, inf
a = np.asarray([0.5, 6.2, np.nan, 4.5, np.inf])
b = np.asarray([np.inf, np.inf, 0.3, np.nan, 0.5])
bad = ~np.logical_or(np.isnan(a), np.isnan(b))
X = np.compress(bad, a)
Y = np.compress(bad, b)
BIAS = np.nanmean(X - Y)
RMSE = np.sqrt(np.nanmean((X - Y)**2))
CORR = np.corrcoef(X, Y)
I need this in order to get both the statistics and plots correctly
You can use np.isfinite(). It will return a boolean mask with True wherever the values are neither infinite nor NAN.
You can get the finite values this way:
a = np.asarray(a)
a = a[np.isfinite(a)]
Or for both arrays together:
mask = np.isfinite(a) | np.isfinite(b)
a = a[mask]
b = b[mask]
np.isfinite
Test element-wise for finiteness (not infinity or not Not a Number).
It works fine to me.
I created it with this to solve the problem in merge NaN and masked arrays:
masked_red_diff = masked_red_diff[np.isfinite(masked_red_diff)]
masked_red_diff.mean()

How to solve real life difference equations using python

I want to solve a difference equation using python.
y = x(n - 1) - (0.5(x(n-2) + x(n))
x here is a long array of values. I want to plot y with respect to another time sequence array t using Plotly. I can plot x with t, but I am not able to generate the filtered signal y. I have tried the code below, but it seems I'm missing something. I am not getting the desired output.
from scipy import signal
from plotly.offline import plot, iplot
x = array(...)
t = array(...) # x and t are big arrays
b = [-0.5, 1, -0.5]
a = 0
y = signal.lfilter(b, a, x, axis=-1, zi=None)
iplot([{"x": t, "y": y}])
However, the output is something like this.
>>>y
>>> array([-inf, ..., nan])
Therefore, I am getting a blank graph.
UPDATE with examples of x and t (9 values each):
x = [3.1137561664814495,
-1.4589810840917137,
-0.12631870857936914,
-1.2695030212226599,
2.7600637824592158,
-1.7810937909691049,
0.050527483431747656,
0.27158522344564368,
0.48001109260160274]
t = [0.0035589523041146265,
0.011991765409288035,
0.020505576424579175,
0.028935389041247817,
0.037447199517441021,
0.045880011487565042,
0.054462819797731044,
0.062835632533346342,
0.071347441874490158]
It appears that your problem is defining a = 0. When running your example, you get the following warning from SciPy:
/usr/local/lib/python2.7/site-packages/scipy/signal/signaltools.py:1353: RuntimeWarning:
divide by zero encountered in true_divide
[-inf inf nan nan nan inf -inf nan nan]
This division by zero is defined by value a. If you look at the documentation of scipy.signal.lfilter, it points out the following:
a : array_like
The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0].
If you change a = 0 to a = 1 you should get output you desire, although do consider that you might want to apply data normalization by a different factor.

Categories

Resources