interpolation between arrays in python - python

What is the easiest and fastest way to interpolate between two arrays to get new array.
For example, I have 3 arrays:
x = np.array([0,1,2,3,4,5])
y = np.array([5,4,3,2,1,0])
z = np.array([0,5])
x,y corresponds to data-points and z is an argument. So at z=0 x array is valid, and at z=5 y array valid. But I need to get new array for z=1. So it could be easily solved by:
a = (y-x)/(z[1]-z[0])*1+x
Problem is that data is not linearly dependent and there are more than 2 arrays with data. Maybe it is possible to use somehow spline interpolation?

This is a univariate to multivariate regression problem. Scipy supports univariate to univariate regression, and multivariate to univariate regression. But you can instead iterate over the outputs, so this is not such a big problem. Below is an example of how it can be done. I've changed the variable names a bit and added a new point:
import numpy as np
from scipy.interpolate import interp1d
X = np.array([0, 5, 10])
Y = np.array([[0, 1, 2, 3, 4, 5],
[5, 4, 3, 2, 1, 0],
[8, 6, 5, 1, -4, -5]])
XX = np.array([0, 1, 5]) # Find YY for these
YY = np.zeros((len(XX), Y.shape[1]))
for i in range(Y.shape[1]):
f = interp1d(X, Y[:, i])
for j in range(len(XX)):
YY[j, i] = f(XX[j])
So YY are the result for XX. Hope it helps.

Related

Generate diagonal matrix from regression coefficient

I am trying to generate a diagonal matrix using a linear regression coefficient. First I generated an empty matrix. Then I extract the coefficient from the regression model. Here's my code:
P = np.zeros((ncol, ncol), dtype = int)
intercep = np.zeros((1, ncol), dtype = int)
my_pls = PLSRegression(n_components = ncomp, scale=False)
model = my_pls.fit(x, y)
#extract pls coeffeicient:
coef = model.coef_
intercep = model.y_mean_ - (model.x_mean_.dot(coef))
P[(i-k):(i+k), i-k] = np.diag(coef[0:ncol])
But I got zero matrices after running the code. Can anyone please help me out with how to get the diagonal matrix from the regression coefficient?
Not sure why you need to declare P.
You can get diagonal matrix with zeros directly from the 1D list/vector using numpy.diag
x=[3,5,6,7]
numpy.diag(x)
Output:
array([[3, 0, 0, 0],
[0, 5, 0, 0],
[0, 0, 6, 0],
[0, 0, 0, 7]])
For your case, try P=np.diag(coef)

Get the least squares straight line for a set of points

For a set of points, I want to get the straight line that is the closest approximation of the points using a least squares fit.
I can find a lot of overly complex solutions here on SO and elsewhere but I have not been able to find something simple. And this should be very simple.
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
slope, intercept = leastSquares(x, y)
Is there some library function that implements the above leastSquares()?
numpy.linalg.lstsq can compute such a fit for you. There is an example in the documentation that does exactly what you need.
https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#numpy-linalg-lstsq
To summarize it here …
>>> x = np.array([0, 1, 2, 3])
>>> y = np.array([-1, 0.2, 0.9, 2.1])
>>> A = np.stack([x, np.ones(len(x))]).T
>>> m, c = np.linalg.lstsq(A, y, rcond=None)[0]
>>> m, c
(1.0 -0.95) # may vary
Well for one, I think for an ordinary least squares fit with a single line you can derive a closed-form solution for the coefficients, if I'm not utterly mistaken. Though there's some pitfalls with numerical stability.
If you look for least squares in general, you'll find more general and thus more complex solutions, because least squares can be done for many more models than just the linear one.
But maybe the sklearn package with its LinearRegression model might do easily what you want to do? https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
or for more detailed control the scipy package, https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html
import numpy as np
from scipy.linalg import lstq
# Turn x into 2d array; raise it to powers 0 (for y-axis-intercept)
# and 1 (for the slope part)
M = x[:, np.newaxis] ** [0, 1]
p, res, rnk, s = lstq(M, y)
intercept, slope = p[0], p[1]
Here's one way to implement the least squares regression:
import numpy as np
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
def leastSquares(x, y):
A = np.vstack([x, np.ones(len(x))]).T
y = y[:, np.newaxis]
slope, intercept = np.dot((np.dot(np.linalg.inv(np.dot(A.T,A)),A.T)),y)
return slope, intercept
slope, intercept = leastSquares(x, y)
You can try with Moore-Penrose pseudo-inverse:
from scipy import linalg
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
x = np.array([x, np.ones(len(x))])
B = linalg.pinv(x)
sol = np.reshape(y,(1,len(y))) # B
slope, intercept = sol[0,0], sol[0,1]

Python array optimization with two constraints

I have an optimization problem where I'm trying to find an array that needs to optimize two functions simultaneously.
In the minimal example below I have two known arrays w and x and an unknown array y. I initialize array y to contains only 1s.
I then specify function np.sqrt(np.sum((x-np.array)**2) and want to find the array y where
np.sqrt(np.sum((x-y)**2) approaches 5
np.sqrt(np.sum((w-y)**2) approaches 8
The code below can be used to successfully optimize y with respect to a single array, but I would like to find that the solution that optimizes y with respect to both x and y simultaneously, but am unsure how to specify the two constraints.
y should only consist of values greater than 0.
Any ideas on how to go about this ?
w = np.array([6, 3, 1, 0, 2])
x = np.array([3, 4, 5, 6, 7])
y = np.array([1, 1, 1, 1, 1])
def func(x, y):
z = np.sqrt(np.sum((x-y)**2)) - 5
return np.zeros(x.shape[0],) + z
r = opt.root(func, x0=y, method='hybr')
print(r.x)
# array([1.97522498 3.47287981 5.1943792 2.10120135 4.09593969])
print(np.sqrt(np.sum((x-r.x)**2)))
# 5.0
One option is to use scipy.optimize.minimize instead of root, Here you have multiple solver options and some of them (ie SLSQP) allow you to specify multiple constraints. Note that I changed the variable names so that x is the array you want to optimise and y and z define the constraints.
from scipy.optimize import minimize
import numpy as np
x0 = np.array([1, 1, 1, 1, 1])
y = np.array([6, 3, 1, 0, 2])
z = np.array([3, 4, 5, 6, 7])
constraint_x = dict(type='ineq',
fun=lambda x: x) # fulfilled if > 0
constraint_y = dict(type='eq',
fun=lambda x: np.linalg.norm(x-y) - 5) # fulfilled if == 0
constraint_z = dict(type='eq',
fun=lambda x: np.linalg.norm(x-z) - 8) # fulfilled if == 0
res = minimize(fun=lambda x: np.linalg.norm(x), constraints=[constraint_y, constraint_z], x0=x0,
method='SLSQP', options=dict(ftol=1e-8)) # default 1e-6
print(res.x) # [1.55517124 1.44981672 1.46921122 1.61335466 2.13174483]
print(np.linalg.norm(res.x-y)) # 5.00000000137866
print(np.linalg.norm(res.x-z)) # 8.000000000930026
This is a minimizer so besides the constraints it also wants a function to minimize, I chose just the norm of y, but setting the function to a constant (ie lambda x: 1) would have also worked.
Note also that the constraints are not exactly fulfilled, you can increase the accuracy by setting optional argument ftol to a smaller value ie 1e-10.
For more information see also the documentation and the corresponding sections for each solver.

I want to calculate slope and intercept of a linear fit using pykalman module

Consider the linear regression of Y on X, where (xi, yi) = (2, 7), (0, 2), (5, 14) for i = 1, 2, 3. The solution is (a, b) = (2.395, 2.079), obtained using the regression function on a hand-held calculator.
I want to calculate the slope and the intercept of a linear fit using
the pykalman module. I'm getting
ValueError: The shape of all parameters is not consistent. Please re-check their values.
I'd really appreciate if someone would help me.
Here is my code :
from pykalman import KalmanFilter
import numpy as np
measurements = np.asarray([[7], [2], [14]])
initial_state_matrix = [[1], [1]]
transition_matrix = [[1, 0], [0, 1]]
observation_covariance_matrix = [[1, 0],[0, 1]]
observation_matrix = [[2, 1], [0, 1], [5, 1]]
kf1 = KalmanFilter(n_dim_state=2, n_dim_obs=6,
transition_matrices=transition_matrix,
observation_matrices=observation_matrix,
initial_state_mean=initial_state_matrix,
observation_covariance=observation_covariance_matrix)
kf1 = kf1.em(measurements, n_iter=0)
(smoothed_state_means, smoothed_state_covariances) = kf1.smooth(measurements)
print smoothed_state_means
Here's the code snippet:
from pykalman import KalmanFilter
import numpy as np
kf = KalmanFilter()
(filtered_state_means, filtered_state_covariances) = kf.filter_update(filtered_state_mean = [[0],[0]], filtered_state_covariance = [[90000,0],[0,90000]], observation=np.asarray([[7],[2],[14]]),transition_matrix = np.asarray([[1,0],[0,1]]), observation_matrix = np.asarray([[2,1],[0,1],[5,1]]), observation_covariance = np.asarray([[.1622,0,0],[0,.1622,0],[0,0,.1622]]))
print filtered_state_means
print filtered_state_covariances
for x in range(0, 1000):
(filtered_state_means, filtered_state_covariances) = kf.filter_update(filtered_state_mean = filtered_state_means, filtered_state_covariance = filtered_state_covariances, observation=np.asarray([[7],[2],[14]]),transition_matrix = np.asarray([[1,0],[0,1]]), observation_matrix = np.asarray([[2,1],[0,1],[5,1]]), observation_covariance = np.asarray([[.1622,0,0],[0,.1622,0],[0,0,.1622]]))
print filtered_state_means
print filtered_state_covariances
filtered_state_covariance was chosen large because we have no idea where our filter_state_mean is initially and the observations are just [[y1],[y2],[y3]]. Observation_matrix is [[x1,1],[x2,1],[x3,1]] thus giving second element as our intercept. Imagine it like this y1 = m*x1+c where m and c are slope and intercept respectively. In our case filtered_state_mean = [[m],[c]]. Notice that the new filtered_state_means is used as filtered_state_mean for new kf.filter_update() (in iterating loop) because we now know where mean lies with filtered_state_covariance = filtered_state_covariances. Iterating it 1000 times converges the mean to real value. If you want to know about the function/method used the link is: https://pykalman.github.io/
If the system state does not change between measurements (also called vacuous movement step), then transition_matrix φ = I.
I'm not sure if what I'm going to say now is true or not. So please correct me if I am wrong
observation_covariance matrix must be of size m x m where m is the number of observations (in our case = 3). The diagonal elements are just variances I believe variance_y1, variance_y2 and variance_y3 and off-diagonal elements are covariances. For example element (1,2) in matrix is standard deviation of y1,(COMMA NOT PRODUCT) standard deviation of y2 and is equal to element (2,1). Similarly for other elements. Can someone help me include uncertainty in x1, x2 and x3. I mean how do you implement uncertainties in x in the above code.

Numpy meshgrid in 3D

Numpy's meshgrid is very useful for converting two vectors to a coordinate grid. What is the easiest way to extend this to three dimensions? So given three vectors x, y, and z, construct 3x3D arrays (instead of 2x2D arrays) which can be used as coordinates.
Numpy (as of 1.8 I think) now supports higher that 2D generation of position grids with meshgrid. One important addition which really helped me is the ability to chose the indexing order (either xy or ij for Cartesian or matrix indexing respectively), which I verified with the following example:
import numpy as np
x_ = np.linspace(0., 1., 10)
y_ = np.linspace(1., 2., 20)
z_ = np.linspace(3., 4., 30)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
assert np.all(x[:,0,0] == x_)
assert np.all(y[0,:,0] == y_)
assert np.all(z[0,0,:] == z_)
Here is the source code of meshgrid:
def meshgrid(x,y):
"""
Return coordinate matrices from two coordinate vectors.
Parameters
----------
x, y : ndarray
Two 1-D arrays representing the x and y coordinates of a grid.
Returns
-------
X, Y : ndarray
For vectors `x`, `y` with lengths ``Nx=len(x)`` and ``Ny=len(y)``,
return `X`, `Y` where `X` and `Y` are ``(Ny, Nx)`` shaped arrays
with the elements of `x` and y repeated to fill the matrix along
the first dimension for `x`, the second for `y`.
See Also
--------
index_tricks.mgrid : Construct a multi-dimensional "meshgrid"
using indexing notation.
index_tricks.ogrid : Construct an open multi-dimensional "meshgrid"
using indexing notation.
Examples
--------
>>> X, Y = np.meshgrid([1,2,3], [4,5,6,7])
>>> X
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> Y
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]])
`meshgrid` is very useful to evaluate functions on a grid.
>>> x = np.arange(-5, 5, 0.1)
>>> y = np.arange(-5, 5, 0.1)
>>> xx, yy = np.meshgrid(x, y)
>>> z = np.sin(xx**2+yy**2)/(xx**2+yy**2)
"""
x = asarray(x)
y = asarray(y)
numRows, numCols = len(y), len(x) # yes, reversed
x = x.reshape(1,numCols)
X = x.repeat(numRows, axis=0)
y = y.reshape(numRows,1)
Y = y.repeat(numCols, axis=1)
return X, Y
It is fairly simple to understand. I extended the pattern to an arbitrary number of dimensions, but this code is by no means optimized (and not thoroughly error-checked either), but you get what you pay for. Hope it helps:
def meshgrid2(*arrs):
arrs = tuple(reversed(arrs)) #edit
lens = map(len, arrs)
dim = len(arrs)
sz = 1
for s in lens:
sz*=s
ans = []
for i, arr in enumerate(arrs):
slc = [1]*dim
slc[i] = lens[i]
arr2 = asarray(arr).reshape(slc)
for j, sz in enumerate(lens):
if j!=i:
arr2 = arr2.repeat(sz, axis=j)
ans.append(arr2)
return tuple(ans)
Can you show us how you are using np.meshgrid? There is a very good chance that you really don't need meshgrid because numpy broadcasting can do the same thing without generating a repetitive array.
For example,
import numpy as np
x=np.arange(2)
y=np.arange(3)
[X,Y] = np.meshgrid(x,y)
S=X+Y
print(S.shape)
# (3, 2)
# Note that meshgrid associates y with the 0-axis, and x with the 1-axis.
print(S)
# [[0 1]
# [1 2]
# [2 3]]
s=np.empty((3,2))
print(s.shape)
# (3, 2)
# x.shape is (2,).
# y.shape is (3,).
# x's shape is broadcasted to (3,2)
# y varies along the 0-axis, so to get its shape broadcasted, we first upgrade it to
# have shape (3,1), using np.newaxis. Arrays of shape (3,1) can be broadcasted to
# arrays of shape (3,2).
s=x+y[:,np.newaxis]
print(s)
# [[0 1]
# [1 2]
# [2 3]]
The point is that S=X+Y can and should be replaced by s=x+y[:,np.newaxis] because
the latter does not require (possibly large) repetitive arrays to be formed. It also generalizes to higher dimensions (more axes) easily. You just add np.newaxis where needed to effect broadcasting as necessary.
See http://www.scipy.org/EricsBroadcastingDoc for more on numpy broadcasting.
i think what you want is
X, Y, Z = numpy.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
for example.
Here is a multidimensional version of meshgrid that I wrote:
def ndmesh(*args):
args = map(np.asarray,args)
return np.broadcast_arrays(*[x[(slice(None),)+(None,)*i] for i, x in enumerate(args)])
Note that the returned arrays are views of the original array data, so changing the original arrays will affect the coordinate arrays.
Instead of writing a new function, numpy.ix_ should do what you want.
Here is an example from the documentation:
>>> ixgrid = np.ix_([0,1], [2,4])
>>> ixgrid
(array([[0],
[1]]), array([[2, 4]]))
>>> ixgrid[0].shape, ixgrid[1].shape
((2, 1), (1, 2))'
You can achieve that by changing the order:
import numpy as np
xx = np.array([1,2,3,4])
yy = np.array([5,6,7])
zz = np.array([9,10])
y, z, x = np.meshgrid(yy, zz, xx)

Categories

Resources