So I was doing my assignment and we are required to use interpolation (linear interpolation) for the same. We have been asked to use the interp1d package from scipy.interpolate and use it to generate new y values given new x values and old coordinates (x1,y1) and (x2,y2).
To get new x coordinates (lets call this x_new) I used np.linspace between (x1,x2) and the new y coordinates (lets call this y_new) I found out using interp1d function on x_new.
However, I also noticed that applying np.linspace on (y1,y2) generates the exact same values of y_new which we got from interp1d on x_new.
Can anyone please explain to me why this is so? And if this is true, is it always true?
And if this is always true why do we at all need to use the interp1d function when we can use the np.linspace in it's place?
Here is the code I wrote:
import scipy.interpolate as ip
import numpy as np
x = [-1.5, 2.23]
y = [0.1, -11]
x_new = np.linspace(start=x[0], stop=x[-1], num=10)
print(x_new)
y_new = np.linspace(start=y[0], stop=y[-1], num=10)
print(y_new)
f = ip.interp1d(x, y)
y_new2 = f(x_new)
print(y_new2) # y_new2 values always the same as y_new
The reason why you stumbled upon this is that you only use two points for an interpolation of a linear function. You have as an input two different x values with corresponding y values. You then ask interp1d to find a linear function f(x)=m*x +b that fits best your input data. As you only have two points as input data, there is an exact solution, because a linear function is exactly defined by two points. To see this: take piece of paper, draw two dots an then think about how many straight lines you can draw to connect these dots.
The linear function that you get from two input points is defined by the parameters m=(y1-y2)/(x1-x2) and b=y1-m*x1, where (x1,y1),(x2,y2) are your two inputs points (or elements in your x and y arrays in your code snippet.
So, now what does np.linspace(start, stop, num,...) do? It gives you num evenly spaced points between start and stop. These points are start, start + delta, ..., end. The step width delta is given by delta=(end-start)/(num - 1). The -1 comes from the fact that you want to include your endpoint. So the nth point in your interval will lie at xn=x1+n*(x2-x1)/(num-1). At what y values will these points end up after we apply our linear function from interp1d? Lets plug it in:
f(xn)=m*xn+b=(y1-y2)/(x1-x2)*(x1+n/(num-1)*(x2-x1)) + y1-(y1-y1)/(x1-x2)*x1. Simplifying this results in f(xn)=(y2-y1)*n/(num - 1) + y1. And this is exactly what you get from np.linspace(y1,y2,num), i.e. f(xn)=yn!
Now, does this always work? No! We made use of the fact that our linear function is defined by the two endpoints of the intervals we use in np.linspace. So this will not work in general. Try to add one more x value and one more y value in your input list and then compare the results.
Related
so I'm trying to get the second derivative of the following formula using numpy.gradient, and I'm trying to differentiate it once by S[:,0] and then by S[:,1]
S = np.random.multivariate_normal(mean, covariance, N)
formula = (S[:,0]**2) * (S[:,1]**2)
But the thing is when I use spacings as the second argument of numpy.gradient
dx = np.diff(S[:,0])
dy = np.diff(S[:,1])
dfdx = np.gradient(formula,dx)
I get the error saying
ValueError: when 1d, distances must match the length of the corresponding dimension
And I get that's because the spacings vector length is one element less than the formula's, but I didn't know what to do to fix that.
I've read somewhere also that you can have coordinates of the point rather than the spacing as the second argument, but when I tried checking the result out of that by differentiating the formula by S[:,0] and then by S[:,1], and then trying to differentiate it this time by S[:,0] and then by S[:,1], and comparing the two results, which should be similar; there was a huge difference between those two results.
Can anybody explain to me what I'm doing wrong here?
When introducing the vector of coordinates of values of your function using Numpy's gradient, you have to be careful to either introduce it as a list with as many arrays as dimensions of your function, or to specify at which axis (as an argument of gradient) you want to calculate the gradient.
When you checked both ways of differentiation, I think the problem is that your formula isn't actually two-dimensional, but one-dimensional (even though you use data from two variables, note your f array has only one dimension).
Take a look at this little script in which we verify that, indeed, the order of differentiation doesn't alter the result (assuming your function is well-behaved).
import numpy as np
# Dummy arrays and function
x = np.linspace(0,1,50)
y = np.linspace(0,2,50)
f = np.sin(2*np.pi*x[:,None]) * np.cos(2*np.pi*y)
dfdx = np.gradient(f, x, axis = 0)
df2dy = np.gradient(dfdx, y, axis = 1)
dfdy = np.gradient(f, y, axis = 1)
df2dx = np.gradient(dfdy, x, axis = 0)
# Check how many values are essentially different
print(np.sum(~np.isclose(df2dx, df2dy)))
Does this apply to your problem?
I want to implement a function interpolate(x, y, X_new) that computes the linear interpolation of the unknown function f at a new point x_new. The sample is given in the form of two sequences x and y. Both sequences have the same length, and their elements are numbers. The x sequence contains the points where the function has been sampled, and the y sequence contains the function value at the corresponding point. (without using import statement).
As I understand your question, you want to write some function y = interpolate(x_values, y_values, x), which will give you the y value at some x? The basic idea then follows these steps:
Find the indices of the values in x_values which define an interval containing x. For instance, for x=3 with your example lists, the containing interval would be [x1,x2]=[2.5,3.4], and the indices would be i1=1, i2=2
Calculate the slope on this interval by (y_values[i2]-y_values[i1])/(x_values[i2]-x_values[i1]) (ie dy/dx).
The value at x is now the value at x1 plus the slope multiplied by the distance from x1.
You will additionally need to decide what happens if x is outside the interval of x_values, either it's an error, or you could interpolate "backwards", assuming the slope is the same as the first/last interval.
Did this help, or did you need more specific advice?
I have some x and y data, where for every entry in the x vector there's a corresponding entry in the y vector. Furthermore, the x data are not evenly spaced.
I'd like to interpolate between the x samples to obtain an even spacing in the x dimension, and to approximate the corresponding y value. In numpy, interp1d seems like a natural solution, but my problem has a caveat: the x values are not monotonically increasing (because both x and y are a function of time). The interp1d function, and the other functions from the interpolate module, thus give weird results at those points where x reverses direction.
What I'd really like to do is simply fit a straight line between every set of two adjacent x points and then interpolate based on this very local approximation. Is there a function to do this in numpy or do I have to rig something up myself?
Could you sort your xy pairs and then use interp1d? Something like this?
import sort
xy = zip(x,y)
new_xy = sorted(xy, key=lambda xy: xy[0])
x = new_xy[:,0]
y = new_xy[:,1]
Now your x's are monotonically increasing and the relationships have been preserved.
I have the following set of points in 3d-space and D'd like to calculate the gradient everywhere, i.e. have a vector field returned.
points = []
for i in np.linspace(-20,20,100):
for j in np.linspace(-20,20,100):
points.append([i,j,i**2+j**2])
points = np.array(points)
It's an elliptic paraboloid.
Using np.gradient(points),
http://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html
I neither get the correct values nor the dimension I would expect. Can anyone give me a hint?
You are mixing together the indices and the values in 'points', so the gradient is giving you wrong results. Here is a better way to construct the points with numpy and calculate the gradient:
x, y = np.mgrid[-20:20:100j, -20:20:100j]
z = x**2 + y**2
grad = np.gradient(z)
The resulting gradient is a tuple with two arrays, one for the gradient on the first direction, another for the gradient on the second direction. Note that this gradient doesn't take into account the separation between points (ie, delta x and delta y), so to get the derivative you need to divide by it:
deriv = grad/(40./100.)
If you want to reconstruct your 'points' as before, you just need to do:
points = np.array([x.ravel(), y.ravel(), z.ravel()]).T
You may also be interested in numpy's diff function, that gives the discrete difference along a given axis.
I am trying to fit a step function using scipy.optimize.leastsq. Consider the following example:
import numpy as np
from scipy.optimize import leastsq
def fitfunc(p, x):
y = np.zeros(x.shape)
y[x < p[0]] = p[1]
y[p[0] < x] = p[2]
return y
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
x = np.arange(1000)
y = np.random.random(1000)
y[x < 250.] -= 10
p0 = [500.,0.,0.]
p1, success = leastsq(errfunc, p0, args=(x, y))
print p1
the parameters are the location of the step and the level on either side. What's strange is that the first free parameter never varies, if you run that scipy will give
[ 5.00000000e+02 -4.49410173e+00 4.88624449e-01]
when the first parameter would be optimal when set to 250 and the second to -10.
Does anyone have any insight as to why this might not be working and how to get it to work?
If I run
print np.sum(errfunc(p1, x, y)**2.)
print np.sum(errfunc([250.,-10.,0.], x, y)**2.)
I find:
12547.1054663
320.679545235
where the first number is what leastsq is finding, and the second is the value for the actual optimal function it should be finding.
It turns out that the fitting is much better if I add the epsfcn= argument to leastsq:
p1, success = leastsq(errfunc, p0, args=(x, y), epsfcn=10.)
and the result is
[ 248.00000146 -8.8273455 0.40818216]
My basic understanding is that the first free parameter has to be moved more than the spacing between neighboring points to affect the square of the residuals, and epsfcn has something to do with how big steps to use to find the gradient, or something similar.
I don't think that least squares fitting is the way to go about coming up with an approximation for a step. I don't believe it will give you a satisfactory description of the discontinuity. Least squares would not be my first thought when attacking this problem.
Why wouldn't you use a Fourier series approximation instead? You'll always be stuck with Gibbs' phenomenon at the discontinuity, but the rest of the function can be approximated as well as you and your CPU can afford.
What exactly are you going to use this for? Some context might help.
I propose approximating the step function. Instead of
inifinite slope at the "change point" make it linear over
one x distance (1.0 in the example). E.g. if the x
parameter, xp, for the function is defined as the midpoint
on this line then the value at xp-0.5 is the lower y value
and the value at xp+0.5 is the higher y value and
intermediate values of the function in the
interval [xp-0.5; xp+0.5] is a linear
interpolation between these two points.
If it can be assumed that the step function (or its
approximation) goes from a lower value to a higher value
then I think the initial guess for the last two parameters
should be the lowest y value and the highest y value
respectively instead of 0.0 and 0.0.
I have 2 corrections:
1) np.random.random() returns random numbers in the range
0.0 to 1.0. Thus the mean is +0.5 and is also the value of
the third parameter (instead 0.0). And the second paramter
is then -9.5 (+0.5 - 10.0) instead of -10.0.
Thus
print np.sum(errfunc([250.,-10.,0.], x, y)**2.)
should be
print np.sum(errfunc([250.,-9.5,0.5], x, y)**2.)
2) In the original fitfunc() one value of y becomes 0.0 if x
is exactly equal to p[0]. Thus it is not a step function in
that case (more like a sum of two step functions). E.g. this
happens when the start value of the first parameter is 500.
Most probably your optimization is stuck in a local minima. I don't know what leastsq really works like, but if you give it an initial estimate of (0, 0, 0), it gets stuck there, too.
You can check the gradient at the initial estimate numerically (evaluate at +/- epsilon for a very small epsilon and divide bei 2*epsilon, take difference) and I bet it will be sth around 0.
use statsmodel ols. ols uses ordinary least square for curve fitting