Python: Use polyval to predict X passing Y - python

I have 2 sets of points (X, Y). I want to:
Use polifit to fit the line
Given a Y predict an X
This is the dataset:
X Y
-0.00001 5.400000e-08
-0.00001 5.700000e-08
0.67187 1.730000e-07
1.99997 9.150000e-07
2.67242 1.582000e-06
4.00001 3.734000e-06
4.67193 5.414000e-06
5.99998 9.935000e-06
6.67223 1.311300e-05
8.00000 2.102900e-05
Which looks like this:
I have seen numpy has the function polyval. But here you pass an X and get a y. How do i reverse it.

As I said in the comments, you can subtract the y value, fit an appropriate degree polynomial, then find it's roots. numpy is easily good enough for that task.
Here is a simple example:
import numpy as np
x = np.arange(-10, 10.1, 0.3)
y = x ** 2
def find_x_from_y(x, y, deg, value, threshold=1E-6):
# subtract the y value, fit a polynomial, then find the roots of it
r = np.roots(np.polyfit(x, y - value, deg))
# return only the real roots.. due to numerical errors, you
# must introduce a threshold value to its complex part.
return r.real[abs(r.imag) < threshold]
>>> find_x_from_y(x, y, 2, 0.5)
array([ 0.70710678, -0.70710678])
Finding roots is a numerical algorithm, it produces the numerical approximation of the actual roots. This might result in really small, but nonzero imaginary parts. To avoid this, you need a small threshold to distingush real and imaginary roots. This is why you can't really use np.isreal:
>>> np.isreal(3.2+1E-7j)
False
A visual example with a 3 degree polynomial:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-10, 10.1, 0.3)
y = x ** 3 - 3 * x ** 2 - 9 * x
def find_x_from_y(x, y, deg, value, threshold=1E-6):
r = np.roots(np.polyfit(x, y - value, deg))
return r.real[abs(r.imag) < threshold]
value = -10
rts = find_x_from_y(x, y, 3, value)
fig = plt.figure(figsize=(10, 10))
plt.plot(x, y)
plt.axhline(value, color="r")
for r in rts:
plt.axvline(r, color="k")

Related

Plotting arrows perpendicular to coordinates

I have a plot like this, plotting a semicircle with x and y
I want to add arrows at each point like so (ignore the horrible paint job):
Is there an easy way to add arrows perpendicular to the plot?
Current code:
import numpy as np
import matplotlib.pyplot as plt
r = 2
h = 0
k = 0
x0 = h-r
x1 = h+r
x = np.linspace(x0,x1,9)
y = k + np.sqrt(r**2 - (x-h)**2)
plt.scatter(x,y)
plt.xlim(-4,4)
plt.ylim(-4,4)
PERPENDICULAR TO THE TANGENT OF THE CURVE I'M SORRY I FORGOT TO ADD THIS
A point in space has no idea what "perpendicular" means, but assuming your y is some function of x that has a derivate, you can think of the derivate of the function at some point to be the tangent of the curve at that point, and to get a perpendicular vector you just need to rotate the vector counter-clockwise 90 degrees:
x1, y1 = -y0, x0
We know that these points come from a circle. So given three points we can easily find the center using basic geometry notions. If you need a refresher, take a look here.
For this particular case, the center is at the origin. Knowing the center coordinates, the normal at each point is just the vector from the center to the point itself. Since the center is the origin, the normals' components are just given by the coordinates of the points themselves.
import numpy as np
import matplotlib.pyplot as plt
r = 2
h = 0
k = 0
x0 = h-r
x1 = h+r
x = np.linspace(x0, x1, 9)
y = k + np.sqrt(r**2 - (x-h)**2)
center = np.array([0.0, 0.0])
plt.scatter(x, y)
plt.quiver(x, y, x, y, width=0.005)
plt.xlim(-4, 4)
plt.ylim(-4, 4)
plt.show()
If you are in a hurry and you do not have time to implement equations, you could use the scikit-spatial library in the following way:
from skspatial.objects import Circle, Vector, Points
import numpy as np
import matplotlib.pyplot as plt
r = 2
h = 0
k = 0
x0 = h-r
x1 = h+r
x = np.linspace(x0, x1, 9)
y = k + np.sqrt(r**2 - (x-h)**2)
points = Points(np.vstack((x, y)).T)
circle = Circle.best_fit(np.vstack((x, y)).T)
center = circle.point
normals = np.array([Vector.from_points(center, point) for point in points])
plt.scatter(x, y)
plt.quiver(x, y, normals[:, 0], normals[:, 1], width=0.005)
plt.xlim(-4, 4)
plt.ylim(-4, 4)
plt.show()
Postulate of blunova's and simon's answers is correct, generally speaking: points have no normal, but curve have; so you need to rely on what you know your curve is. Either, as blunova described it, by the knowledge that it is a circle, and computing those normal with ad-hoc computation from that knowledge.
Or, as I am about to describe, using the function f such as y=f(x). and using knowledge on what is the normal to such a (x,f(x)) chart.
Here is your code, written with such a function f
import numpy as np
import matplotlib.pyplot as plt
r = 2
h = 0
k = 0
x0 = h-r
x1 = h+r
x = np.linspace(x0,x1,9)
def f(x):
return k + np.sqrt(r**2 - (x-h)**2)
y=f(x)
plt.scatter(x,y)
plt.xlim(-4,4)
plt.ylim(-4,4)
So, all I did here is rewriting your line y=... in the form of a function.
From there, it is possible to compute the normal to each point of the chart (x,f(x)).
The tangent to a point (x,f(x)) is well known: it is vector (1,f'(x)), where f'(x) is the derivative of f. So, normal to that is (-f'(x), 1).
Divided by √(f'(x)²+1) to normalize this vector.
So, just use that as entry to quiver.
First compute a derivative of your function
dx=0.001
def fprime(x):
return (f(x+dx)-f(x-dx))/(2*dx)
Then, just
plt.quiver(x, f(x), -fprime(x), 1)
Or, to have all vector normalized
plt.quiver(x, f(x), -fprime(x)/np.sqrt(fprime(x)**2+1), 1/np.sqrt(fprime(x)**2+1))
(note that fprime and the normalization part are all vectorizable operation, so it works with x being a arange)
All together
import numpy as np
import matplotlib.pyplot as plt
r = 2
h = 0
k = 0
x0 = h-r
x1 = h+r
def f(x):
return k+ np.sqrt(r**2 - (x-h)**2)
dx=0.001
x = np.linspace(x0+dx,x1-dx,9)
y = f(x)
def fprime(x):
return (f(x+dx)-f(x-dx))/(2*dx)
plt.scatter(x,y)
plt.quiver(x,f(x), -fprime(x)/np.sqrt(fprime(x)**2+1), 1/np.sqrt(fprime(x)**2+1))
plt.xlim(-4,4)
plt.ylim(-4,4)
plt.show()
That is almost an exact copy of your code, but for the quiver line, and with the addition of fprime.
One other slight change, specific to your curve, is that I changed x range to ensure the computability of fprime (if first x is x0, then fprime need f(x0-dx) which does not exist because of sqrt. Likewise for x1. So, first x is x0+dx, and last is x1-dx, which is visually the same)
That is the main advantage of this solution over blunova's: it is your code, essentially. And would work if you change f, without assuming that f is a circle. All that is assume is that f is derivable (and if it were not, you could not define what those normal are anyway).
For example, if you want to do the same with a parabola instead, just change f
import numpy as np
import matplotlib.pyplot as plt
r = 2
h = 0
k = 0
x0 = h-r
x1 = h+r
def f(x):
return x**2
dx=0.001
x = np.linspace(x0+dx,x1-dx,9)
y = f(x)
def fprime(x):
return (f(x+dx)-f(x-dx))/(2*dx)
plt.scatter(x,y)
plt.quiver(x,f(x), -fprime(x)/np.sqrt(fprime(x)**2+1), 1/np.sqrt(fprime(x)**2+1))
plt.xlim(-4,4)
plt.ylim(-2,5)
plt.show()
All I changed here is the f formula. Not need for a new reasoning to compute the normal.
Last remark: an even more accurate version (not forcing the approximate computation of fprime with a dx) would be to use sympy to define f, and then compute the real, symbolic, derivative of f. But that doesn't seem necessary for your case.

Fourier series in Python

I am trying to implement Complex Exponential Fourier Series for f(x) defined on [-L,L] using these formulas,
I want to be able to implement these without calling the Fourier functions in other libraries since I want to also understand what's going on. Here is my attempt,
import numpy as np
from matplotlib import pyplot as plt
steps = 100
dt = 1/steps
L = np.pi
t = np.linspace(-L, L, steps)
def constant(X, Y, n):
return (1/(2*L))*sum([y*np.exp((1j*n*np.pi*t)/L)*dt for t, y in zip(X, Y)])
def complex_fourier(X, Y, N):
_X, _Y = [], []
for t in X:
f = 0
for n in range(-N//2, N//2 + 1):
c = constant(X, Y, n)
f += c*np.exp((-1j*n*np.pi*t)/L)
_X += [f.real]
_Y += [f.imag]
return _X, _Y
X, Y = complex_fourier(t, np.sin(t), 50)
plt.plot(X, Y, 'k.')
# plt.plot(t, np.sin(t))
plt.show()
The plot seems to be almost random and does not improve with more c terms. Could someone point out exactly what am I doing wrong?
Just to answer the "random plot" part of the question for now - note the Y-scale of your plot!
>>> np.min(Y), np.max(Y)
(-6.1937063114043705e-18, 6.43002899658067e-18)
>>> np.min(X), np.max(X)
(-0.15754356079010426, 0.15754356079010395)
In other words, all of your coefficients are basically real valued. You probably wouldn't be interested in an plot of the imaginary part vs the real part, but rather the sum of squares vs the frequency or mode number.

equivalent to numpy.linalg.lstsq that allows weighting

I am fitting a 2d polynomial with the numpy function linalg.lstsq:
coeffs = np.array([y*0+1, y, x, x**2, y**2]).T
coeff_r, r, rank, s =np.linalg.lstsq(coeffs, values)
Some points that I am trying to fit are more reliable than others.
Is there a way to weigh the points differently?
Thanks
lstsq is enough for this; the weights can be applied to the equations. That is, if in an overdetermined system
3*a + 2*b = 9
2*a + 3*b = 4
5*a - 4*b = 2
you care about the first equation more than about the others, multiply it by some number greater than 1. For example, by 5:
15*a + 10*b = 45
2*a + 3*b = 4
5*a - 4*b = 2
Mathematically, the system is the same, but the least squares solution will be different because it minimizes the sum of squares of the residuals, and the residual of the 1st equation got multiplied by 5.
Here is an example based on your code (with small adjustments to make it more NumPythonic). First, unweighted fit:
import numpy as np
x, y = np.meshgrid(np.arange(0, 3), np.arange(0, 3))
x = x.ravel()
y = y.ravel()
values = np.sqrt(x+y+2) # some values to fit
functions = np.stack([np.ones_like(y), y, x, x**2, y**2], axis=1)
coeff_r = np.linalg.lstsq(functions, values, rcond=None)[0]
values_r = functions.dot(coeff_r)
print(values_r - values)
This displays the residuals as
[ 0.03885814 -0.00502763 -0.03383051 -0.00502763 0.00097465 0.00405298
-0.03383051 0.00405298 0.02977753]
Now I give the 1st data point greater weight.
weights = np.ones_like(x)
weights[0] = 5
coeff_r = np.linalg.lstsq(functions*weights[:, None], values*weights, rcond=None)[0]
values_r = functions.dot(coeff_r)
print(values_r - values)
Residuals:
[ 0.00271103 -0.01948647 -0.04828936 -0.01948647 0.00820407 0.0112824
-0.04828936 0.0112824 0.03700695]
The first residual is now an order of magnitude smaller, of course at the expense of others residuals.

Solving an ordinary differential equation on a fixed grid (preferably in python)

I have a differential equation of the form
dy(x)/dx = f(y,x)
that I would like to solve for y.
I have an array xs containing all of the values of x for which I need ys.
For only those values of x, I can evaluate f(y,x) for any y.
How can I solve for ys, preferably in python?
MWE
import numpy as np
# these are the only x values that are legal
xs = np.array([0.15, 0.383, 0.99, 1.0001])
# some made up function --- I don't actually have an analytic form like this
def f(y, x):
if not np.any(np.isclose(x, xs)):
return np.nan
return np.sin(y + x**2)
# now I want to know which array of ys satisfies dy(x)/dx = f(y,x)
Assuming you can use something simple like Forward Euler...
Numerical solutions will rely on approximate solutions at previous times. So if you want a solution at t = 1 it is likely you will need the approximate solution at t<1.
My advice is to figure out what step size will allow you to hit the times you need, and then find the approximate solution on an interval containing those times.
import numpy as np
#from your example, smallest step size required to hit all would be 0.0001.
a = 0 #start point
b = 1.5 #possible end point
h = 0.0001
N = float(b-a)/h
y = np.zeros(n)
t = np.linspace(a,b,n)
y[0] = 0.1 #initial condition here
for i in range(1,n):
y[i] = y[i-1] + h*f(t[i-1],y[i-1])
Alternatively, you could use an adaptive step method (which I am not prepared to explain right now) to take larger steps between the times you need.
Or, you could find an approximate solution over an interval using a coarser mesh and interpolate the solution.
Any of these should work.
I think you should first solve ODE on a regular grid, and then interpolate solution on your fixed grid. The approximate code for your problem
import numpy as np
from scipy.integrate import odeint
from scipy import interpolate
xs = np.array([0.15, 0.383, 0.99, 1.0001])
# dy/dx = f(x,y)
def dy_dx(y, x):
return np.sin(y + x ** 2)
y0 = 0.0 # init condition
x = np.linspace(0, 10, 200)# here you can control an accuracy
sol = odeint(dy_dx, y0, x)
f = interpolate.interp1d(x, np.ravel(sol))
ys = f(xs)
But dy_dx(y, x) should always return something reasonable (not np.none).
Here is the drawing for this case

Search numpy array ((x, y, z)...) for z matching nearest x, y

I have a very large array similar to elevation data of the format:
triplets = ((x0, y0, z0),
(x1, y1, z1),
... ,
(xn, yn, zn))
where x, y, z are all floats in metres. You can create suitable test data matching this format with:
x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))
I want to be able to efficiently find the corresponding z-value for a given (x,y) pair. My research so far leads to more questions. Here's what I've got:
Iterate through all of the triplets:
query = (a, b) # where a, b are the x and y coordinates we're looking for
for i in triplets:
if i[0] == query[0] and i[1] == query[1]:
result = i[2]
Drawbacks: slow; a, b must exist, which is a problem with comparing floats.
Use scipy.spatial.cKDTree to find nearest:
points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
query = tree.data[idx]
result = triplets[idx, 2]
Drawbacks: returns nearest point rather than interpolating.
Using interp2d as per comment:
f = interp2d(x, y, z)
result = f(a, b)
Drawbacks: doesn't work on a large dataset. I get OverflowError: Too many data points to interpolate when run on real data. (My real data is some 11 million points.)
So the question is: is there any straightforward way of doing this that I'm overlooking? Are there ways to reduce the drawbacks of the above?
If you want to interpolate the result, rather than just find the z value for the nearest neighbour, I would consider doing something like the following:
Use a k-d tree to partition your data points according to their (x, y) coordinates
For a given (xi, yi) point to interpolate, find its k nearest neighbours
Take the average of their z values, weighted according to their distance from (xi, yi)
The code might look something like this:
import numpy as np
from scipy.spatial import cKDTree
# some fake (x, y, z) data
XY = np.random.rand(10000, 2) - 0.5
Z = np.exp(-((XY ** 2).sum(1) / 0.1) ** 2)
# construct a k-d tree from the (x, y) coordinates
tree = cKDTree(XY)
# a random point to query
xy = np.random.rand(2) - 0.5
# find the k nearest neighbours (say, k=3)
distances, indices = tree.query(xy, k=3)
# the z-values for the k nearest neighbours of xy
z_vals = Z[indices]
# take the average of these z-values, weighted by 1 / distance from xy
dw_avg = np.average(z_vals, weights=(1. / distances))
It's worth playing around a bit with the value of k, the number of nearest neighbours to take the average of. This is essentially a crude form of kernel density estimation, where the value of k controls the degree of 'smoothness' you're imposing on the underlying distribution of z-values. A larger k results in more smoothness.
Similarly, you might want to play around with how you weight the contributions of points according to their distance from (xi, yi), depending on how you think similarity in z decreases with increasing x, y distance. For example you might want to weight by (1 / distances ** 2) rather than (1 / distances).
In terms of performance, constructing and searching k-d trees are both very efficient. Bear in mind that you only need to construct the tree once for your dataset, and if necessary you can query multiple points at a time by passing (N, 2) arrays to tree.query().
Tools for approximate nearest neighbour searches, such as FLANN, might potentially be quicker, but these are usually more helpful in situations when the dimensionality of your data is very high.
I don't understand your cKDTree code, you got the idx, why do the for loop again? You can get the result just by result = triplets[idx, 2].
from scipy.spatial import cKDTree
x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))
a = 30.1
b = 40.5
points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
result = triplets[idx, 2]
You can create a sparse matrix and use simple indexing.
In [1]: import numpy as np
In [2]: x = np.arange(20, 40, dtype=np.float64)
In [3]: y = np.arange(30, 50, dtype=np.float64)
In [4]: z = np.random.random(20) * 25.0
In [9]: from scipy.sparse import coo_matrix
In [12]: m = coo_matrix((z, (x, y))).tolil()
In [17]: m[25,35]
Out[17]: 17.410532044604292

Categories

Resources