I'm trying to write a function to take the derivative of any general function / array of numbers. Specifically, I am using a Central difference formula. The issue is, I cannot compute the boundary points of the derivative as the central difference formula uses indices that would be out of bound. My code is below
import numpy as np
n = 20000 # number of points in array
xs = np.linspace(start=-2*np.pi, stop=2*np.pi, num=n) # x values
y = np.array([np.sin(i) for i in xs]) # our function, sine
def deriv(f, h):
"""
Calauclate the numerical derivative of any function
:param f: numpy.array(float), the array of numbers we differentiate
:param h: step size
:rtype d: numpy.array(float)
"""
d = np.zeros_like(f)
# this loop misses the first and last points in f
for i in range(1, f.shape[0]-1):
# 2-point formula
d[i] = (f[i+1] - f[i-1])/(2*h)
return d
h = abs(xs[0] - xs[1]) # step size
y1 = deriv(y, h) # first derivative
y2 = deriv(y1, h) # second derivative
y3 = deriv(y2, h) # third derivative
When I plot y,y1,y2,y3 you can see it blows up at the end points
What I have tried to do is set the end points to their nearest neighbors in deriv as below. While this works for low order derivatives (1st and 2nd) it starts to break at higher order derivatives (3rd and greater).
...
d = np.zeros_like(f)
for i in range(1, f.shape[0]-1):
d[i] = (f[i+1] - f[i-1])/(2*h)
d[0] = d[1]
d[-1] = d[-2]
...
The derivative in the middle, away from the boundaries, is calculating fine. The issue is with the boundaries.
How should I treat the boundary conditions here? Would a different numerical differentiation scheme work better than the central difference scheme?
EDIT: I am looking for a general method to solve this, not just a method that can be applied to the sine function or any other periodic function as I have used to illustrate the issue here.
This is more a numerical methods question than a programming question.
Anyways, if your function has periodic boundary conditions (it looks it is a sinusoidal wave so in this case you have periodicity) just create a new array with 2 additional elements: the new array start element will be your last element of the original array and the end element of the new array will be the start element of the original array. Here is a way to do it
f_periodic = np.zeros(f.size+2)
f_periodic[1:-1], f_periodic[0], f_periodic[-1] = f, f[-1], f[0]
You can now differentiate on f_periodic for which d[1] and d[-2] will be the correct derivative value on the boundaries (disregard d[0] and d[-1]) .
Edit after OP's new requirements...
For more general boundary conditions, say a specific value at the boundaries, there are different approaches one can follow:
Use ghost values:
Again extend the function and extrapolate values for the new boundaries. Depending on the order of numerical differentiation more ghost cells will be required. For the current scheme, a simple linear extrapolation will do (only 1 ghost value at each boundary are required):
f_new = np.zeros(f.size+2)
f_new[1:-1] = f
f_new[-1] = f[-2] + (f[-2]-f[-3])/(x[-2]-x[-3])*(x[-1]-x[-2])
f_new[0] = f[1] + (f[1]-f[2])/(x[1]-x[2])*(x[0]-x[1])
Note that you have to also extend x. However, since you have a constant spacing just use h instead of spatial differences, e.g. x[-2]-x[-3]. You can now differentiate f_new and you will get an 1st-order approximation of the derivative on the boundaries (since you used a linear extrapolation to find the ghost value).
Use forwards and backwards schemes on the boundaries
I will not show code here, but basically you need to differentiate using an boundary value and the right (forwards) or left (backwards) value for the left and right boundaries respectively. This is a first-order approximation.
You can use the forward and backward differentiation scheme of order 2 for the boundary points. Essentially, we know that
(f(x+h)-f(x))/h = f'(x) + h/2*f''(x) + O(h²) (I)
and
(f(x+2h) - 2f(x+h) + f(x))/h² = f''(x+h) + O(h²) = f''(x) + O(h) (II)
Use the last to replace the first order term with the second derivative, that is, compute (I)-h/2*(II) to get
(-1/2*f(x+2h) + 2*f(x+h) -3/2*f(x))/h = f'(x) + O(h²)
Note that the O(h²) error in the first derivative will in general lead to an O(h) error in the second iteration of the divided differences and O(1) in the third. One may argue that the error terms cancel suitably, but that will only happen for the inner points, the one-sided derivatives will "spoil" that pattern in increasing distance from the boundary.
Related
I'm facing a problem while trying to implement the coupled differential equation below (also known as single-mode coupling equation) in Python 3.8.3. As for the solver, I am using Scipy's function scipy.integrate.solve_bvp, whose documentation can be read here. I want to solve the equations in the complex domain, for different values of the propagation axis (z) and different values of beta (beta_analysis).
The problem is that it is extremely slow (not manageable) compared with an equivalent implementation in Matlab using the functions bvp4c, bvpinit and bvpset. Evaluating the first few iterations of both executions, they return the same result, except for the resulting mesh which is a lot greater in the case of Scipy. The mesh sometimes even saturates to the maximum value.
The equation to be solved is shown here below, along with the boundary conditions function.
import h5py
import numpy as np
from scipy import integrate
def coupling_equation(z_mesh, a):
ka_z = k # Global
z_a = z # Global
a_p = np.empty_like(a).astype(complex)
for idx, z_i in enumerate(z_mesh):
beta_zf_i = np.interp(z_i, z_a, beta_zf) # Get beta at the desired point of the mesh
ka_z_i = np.interp(z_i, z_a, ka_z) # Get ka at the desired point of the mesh
coupling_matrix = np.empty((2, 2), complex)
coupling_matrix[0] = [-1j * beta_zf_i, ka_z_i]
coupling_matrix[1] = [ka_z_i, 1j * beta_zf_i]
a_p[:, idx] = np.matmul(coupling_matrix, a[:, idx]) # Solve the coupling matrix
return a_p
def boundary_conditions(a_a, a_b):
return np.hstack(((a_a[0]-1), a_b[1]))
Moreover, I couldn't find a way to pass k, z and beta_zf as arguments of the function coupling_equation, given that the fun argument of the solve_bpv function must be a callable with the parameters (x, y). My approach is to define some global variables, but I would appreciate any help on this too if there is a better solution.
The analysis function which I am trying to code is:
def analysis(k, z, beta_analysis, max_mesh):
s11_analysis = np.empty_like(beta_analysis, dtype=complex)
s21_analysis = np.empty_like(beta_analysis, dtype=complex)
initial_mesh = np.linspace(z[0], z[-1], 10) # Initial mesh of 10 samples along L
mesh = initial_mesh
# a_init must be complex in order to solve the problem in a complex domain
a_init = np.vstack((np.ones(np.size(initial_mesh)).astype(complex),
np.zeros(np.size(initial_mesh)).astype(complex)))
for idx, beta in enumerate(beta_analysis):
print(f"Iteration {idx}: beta_analysis = {beta}")
global beta_zf
beta_zf = beta * np.ones(len(z)) # Global variable so as to use it in coupling_equation(x, y)
a = integrate.solve_bvp(fun=coupling_equation,
bc=boundary_conditions,
x=mesh,
y=a_init,
max_nodes=max_mesh,
verbose=1)
# mesh = a.x # Mesh for the next iteration
# a_init = a.y # Initial guess for the next iteration, corresponding to the current solution
s11_analysis[idx] = a.y[1][0]
s21_analysis[idx] = a.y[0][-1]
return s11_analysis, s21_analysis
I suspect that the problem has something to do with the initial guess that is being passed to the different iterations (see commented lines inside the loop in the analysis function). I try to set the solution of an iteration as the initial guess for the following (which must reduce the time needed for the solver), but it is even slower, which I don't understand. Maybe I missed something, because it is my first time trying to solve differential equations.
The parameters used for the execution are the following:
f2 = h5py.File(r'path/to/file', 'r')
k = np.array(f2['k']).squeeze()
z = np.array(f2['z']).squeeze()
f2.close()
analysis_points = 501
max_mesh = 1e6
beta_0 = 3e2;
beta_low = 0; # Lower value of the frequency for the analysis
beta_up = beta_0; # Upper value of the frequency for the analysis
beta_analysis = np.linspace(beta_low, beta_up, analysis_points);
s11_analysis, s21_analysis = analysis(k, z, beta_analysis, max_mesh)
Any ideas on how to improve the performance of these functions? Thank you all in advance, and sorry if the question is not well-formulated, I accept any suggestions about this.
Edit: Added some information about performance and sizing of the problem.
In practice, I can't find a relation that determines de number of times coupling_equation is called. It must be a matter of the internal operation of the solver. I checked the number of callings in one iteration by printing a line, and it happened in 133 ocasions (this was one of the fastests). This must be multiplied by the number of iterations of beta. For the analyzed one, the solver returned this:
Solved in 11 iterations, number of nodes 529.
Maximum relative residual: 9.99e-04
Maximum boundary residual: 0.00e+00
The shapes of a and z_mesh are correlated, since z_mesh is a vector whose length corresponds with the size of the mesh, recalculated by the solver each time it calls coupling_equation. Given that a contains the amplitudes of the progressive and regressive waves at each point of z_mesh, the shape of a is (2, len(z_mesh)).
In terms of computation times, I only managed to achieve 19 iterations in about 2 hours with Python. In this case, the initial iterations were faster, but they start to take more time as their mesh grows, until the point that the mesh saturates to the maximum allowed value. I think this is because of the value of the input coupling coefficients in that point, because it also happens when no loop in beta_analysisis executed (just the solve_bvp function for the intermediate value of beta). Instead, Matlab managed to return a solution for the entire problem in just 6 minutes, aproximately. If I pass the result of the last iteration as initial_guess (commented lines in the analysis function, the mesh overflows even faster and it is impossible to get more than a couple iterations.
Based on semi-random inputs, we can see that max_mesh is sometimes reached. This means that coupling_equation can be called with a quite big z_mesh and a arrays. The problem is that coupling_equation contains a slow pure-Python loop iterating on each column of the arrays. You can speed the computation up a lot using Numpy vectorization. Here is an implementation:
def coupling_equation_fast(z_mesh, a):
ka_z = k # Global
z_a = z # Global
a_p = np.empty(a.shape, dtype=np.complex128)
beta_zf_i = np.interp(z_mesh, z_a, beta_zf) # Get beta at the desired point of the mesh
ka_z_i = np.interp(z_mesh, z_a, ka_z) # Get ka at the desired point of the mesh
# Fast manual matrix multiplication
a_p[0] = (-1j * beta_zf_i) * a[0] + ka_z_i * a[1]
a_p[1] = ka_z_i * a[0] + (1j * beta_zf_i) * a[1]
return a_p
This code provides a similar output with semi-random inputs compared to the original implementation but is roughly 20 times faster on my machine.
Furthermore, I do not know if max_mesh happens to be big with your inputs too and even if this is normal/intended. It may make sense to decrease the value of max_mesh in order to reduce the execution time even more.
I want to implement ifft2 using DFT matrix. The following code works for fft2.
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.fft2(rA)
dftMtxM=DFT_matrix(sizeM)
dftMtxN=DFT_matrix(sizeN)
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
To get to ifft2 I assumd I need to change only the dft matrix to it's transpose, so expected the following to work, but I got false for the last two print any suggesetion please?
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.ifft2(rA)
dftMtxM=np.conj(DFT_matrix(sizeM))
dftMtxN=np.conj(DFT_matrix(sizeN))
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
I am going to be building on some things from my answer to your previous question. Please note that I will try to distinguish between the terms Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT). Remember that DFT is the transform while FFT is only an efficient algorithm for performing it. People, including myself, however very commonly refer to the DFT as FFT since it is practically the only algorithm used for computing the DFT
The problem here is again the normalization of the data. It's interesting that this is such a fundamental and confusing part of any DFT operations yet I couldn't find a good explanation on the internet. I will try to provide a summary at the end about DFT normalization however I think the best way to understand this is by working through some examples yourself.
Why the comparisons fail?
It's important to note, that even though both of the allclose tests seemingly fail, they are actually not a very good method of comparing two complex number arrays.
Difference between two angles
In particular, the problem is when it comes to comparing angles. If you just take the difference of two close angles that are on the border between -pi and pi, you can get a value that is around 2*pi. The allclose just takes differences between values and checks that they are bellow some threshold. Thus in our cases, it can report a false negative.
A better way to compare angles is something along the lines of this function:
def angle_difference(a, b):
diff = a - b
diff[diff < -np.pi] += 2*np.pi
diff[diff > np.pi] -= 2*np.pi
return diff
You can then take the maximum absolute value and check that it's bellow some threshold:
np.max(np.abs(angle_difference(np.angle(mA), np.angle(rAfft)))) < threshold
In the case of your example, the maximum difference was 3.072209153742733e-12.
So the angles are actually correct!
Magnitude scaling
We can get an idea of the issue is when we look at the magnitude ratio between the matrix iDFT and the library iFFT.
print(np.abs(mA)/np.abs(rAfft))
We find that all the values in mA are 800, which means that our absolute values are 800 times larger than those computed by the library. Suspiciously, 800 = 40 * 20, the dimensions of our data! I think you can see where I am going with this.
Confusing DFT normalization
We spot some indications why this is the case when we have a look at the FFT formulas as taken from the Numpy FFT documentation:
You will notice that while the forward transform doesn't normalize by anything. The reverse transform divides the output by 1/N. These are the 1D FFTs but the exact same thing applies in the 2D case, the inverse transform multiplies everything by 1/(N*M)
So in our example, if we update this line, we will get the magnitudes to agree:
mA = dftMtxM # rA/(sizeM * sizeN) # dftMtxN
A side note on comparing the outputs, an alternative way to compare complex numbers is to compare the real and imaginary components:
print(np.allclose(mA.real, rAfft.real))
print(np.allclose(mA.imag, rAfft.imag))
And we find that now indeed both methods agree.
Why all this normalization mess and which should I use?
The fundamental property of the DFT transform must satisfy is that iDFT(DFT(x)) = x. When you work through the math, you find that the product of the two coefficients before the sum has to be 1/N.
There is also something called the Parseval's theorem. In simple terms, it states that the energy in the signals is just the sum of square absolutes in both the time domain and frequency domain. For the FFT this boils down to this relationship:
Here is the function for computing the energy of a signal:
def energy(x):
return np.sum(np.abs(x)**2)
You are basically faced with a choice about the 1/N factor:
You can put the 1/N before the DFT sum. This makes senses as then the k=0 DC component will be equal to the average of the time domain values. However you will have to multiply the energy in frequency domain by N in order to match it with time domain frequency.
N = len(x)
X = np.fft.fft(x)/N # Compute the FFT scaled by `1/N`
# Energy related by `N`
np.allclose(energy(x), energy(X) * N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y*N) # Compute the iFFT, remember to cancel out the built in `1/N` of ifft
You put the 1/N before the iDFT. This is, slightly counterintuitively, what most implementations, including Numpy do. I could not find a definitive consensus on the reasoning behind this, but I think it has something to do with the implementation efficiency. (If anyone has a better explanation for this, please leave it in the comments) As shown in the equations earlier, the energy in the frequency domain has to be divided by N to match the time domain energy.
N = len(x)
X = np.fft.fft(x) # Compute the FFT without scaling
# Energy, related by 1/N
np.allclose(energy(x), energy(X) / N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y) # Compute the iFFT with the build in `1/N`
You can split the 1/N by placing 1/sqrt(N) before each of the transforms making them perfectly symmetric. In Numpy, you can provide the parameter norm="ortho" to the fft functions which will make them use the 1/sqrt(N) normalization instead: np.fft.fft(x, norm="ortho") The nice property here is that the energy now matches in both domains.
X = np.fft.fft(x, norm='orth') # Compute the FFT scaled by `1/sqrt(N)`
# Perform some processing...
# Energy are equal:
np.allclose(energy(x), energy(X)) == True
Y = X * H
y = np.fft.ifft(Y, norm='orth') # Compute the iFFT, with scaling by `1/sqrt(N)`
In the end it boils down to what you need. Most of the time the absolute magnitude of your DFT is actually not that important. You are mostly interested in the ratio of various components or you want to perform some operation in the frequency domain but then transform back to the time domain or you are interested in the phase (angles). In all of these case, the normalization does not really play an important role, as long as you stay consistent.
The problem:
Consider a system with a mass and a spring as shown in the picture below. The stiffness of the spring and the mass of the object are known. Therefore, if the spring is stretched the force the spring exerts can be calculated from Hooke`s law and the instantaneous acceleration can be estimated from Newton´s laws of motion. Integrating the acceleration twice yields the distance the spring would move and subtracting that from the initial length results in a new position to calculate the acceleration and start the loop again. Therefore as the acceleration decreases linearly the speed levels off at a certain value (top right) Everything after that point, spring compressing & decelerating is neglected for this case.
My question is how would to go about coding that up in python. So far I have written some pseudocode.
instantaneous_acceleration = lambda x: 5*x/10 # a = kx/m
delta_time = 0.01 #10 milliseconds
a[0] = instantaneous_acceleration(12) #initial acceleration when stretched to 12 m
v[0] = 0 #initial velocity 0 m/s
s[0] = 12 #initial length 12 m
i = 1
while a[i] > 12:
v[i] = a[i-1]*delta_time + v[i-1] #calculate the next velocity
s[i] = v[i]*delta_time + s[i-1] #calculate the next position
a[i] = instantaneous_acceleration (s[i]) #use the position to derive the new accleration
i = i + 1
Any help or tips are greatly appreciated.
If you're going to integrate up front - which is a good idea and absolutely the way to go when you can - then you can just write down the equations as functions of t for everything:
x'' = -kx/m
x'' + (k/m)x = 0
r^2 + k/m = 0
r^2 = -(k/m)
r = i*sqrt(k/m)
x(t) = A*e^(i*sqrt(k/m)t)
= A*cos(sqrt(k/m)t + B) + i*A*sin(sqrt(k/m)t + B)
= A*cos(sqrt(k/m)t + B)
From initial conditions we know that
x(0) = 12 = A*cos(B)
v(0) = 0 = -sqrt(k/m)*A*sin(B)
The second of these equation is true only if we choose A = 0 or B = 0 or B = Pi.
if A = 0, then the first equation has no solution.
if B = 0, the first equation has solution A = 12.
if B = Pi, the first equation has solution A = -12.
We probably prefer B = 0 and A = 12. This gives
x(t) = 12*cos(sqrt(k/m)t)
v(t) = -12*sqrt(k/m)*sin(sqrt(k/m)t)
a(t) = -12*(k/m)cos(sqrt(k/m)t)
Thus, at any incremental time t[n+1] = t[n] + dt, we can simply calculate the precise position, velocity and acceleration for t[n] without any drift or inaccuracy ever accumulating.
All that said, if you are interested in how to numerically find x(t) and v(t) and a(t) given an arbitrary ordinary differential equation, the answer is much harder. There are lots of good ways of doing what can be called numerical integration. Euler's method is the easiest:
// initial conditions
t[0] = 0
x[0] = …
x'[0] = …
…
x^(n-1)[0] = …
x^(n)[0] = 0
// iterative step
x^(n)[k+1] = f(x^(n-1)[k], …, x'[k], x[k], t[k])
x^(n-1)[k+1] = x^(n-1)[k] + dt * x^(n)[k]
…
x'[k+1] = x'[k] + dt * x''[k]
x[k+1] = x[k] + dt * x'[k]
t[k+1] = t[k] + dt
The smaller a value of dt you choose, the longer it takes to run for a fixed duration of time, but the more accurate the results you get. This is basically doing a Riemann sum of the function and all its derivatives up to the highest one involved in the ODE.
A more accurate version of this, Simpson's rule, does the same thing but takes the average value over the last time quantum (rather than either endpoint's value; the example above uses the beginning of the interval). The average value over the interval is guaranteed to be closer to the true value over the interval than either endpoint (unless the function was constant over that interval, in which case Simpson is at least as good).
Probably the best standard numerical integration methods for ODEs (assuming you don't need something like leapfrog methods for greater stability) are the Runge Kutta methods. An adaptive timestep Runge Kutta method of sufficient order should usually do the trick and give you accurate answers. Unfortunately, the mathematics to explain the Runge Kutta methods is probably too advanced and time consuming to cover here, but you can find information on these and other advanced techniques online or in e.g. Numerical Recipes, a series of books on numerical methods which contains lots of very useful code samples.
Even the Runge Kutta methods work basically by refining the guess at the function's value over the time quantum, though. They just do it in more sophisticated ways which provably reduce the error at each step.
You have a sign error in the force, for a spring or any other oscillation it should always be opposite to the excitation direction. Correcting this gives instantly an oscillation. However, your loop condition will now never be satisfied, so you have to also adapt that.
You can immediately increase the order of your method by elevating it from the current symplectic Euler method to Leapfrog-Verlet. You only have to change the interpretation of v[i] to be the velocity at t[i]-dt/2. Then the first update uses the acceleration in the middle at t[i-1] to compute the velocity at t[i-1]+dt/2=t[i]-dt/2 from the velocity at t[i-1]-dt/2 using a midpoint formula. Then in the next line the position update is a similar midpoint formula using the velocity at the middle time between the position times. All you have to change in the code to get this advantage is to set the initial velocity to the one at time t[0]-dt/2 using the Taylor expansion at t[0].
instantaneous_acceleration = lambda x: -5*x/10 # a = kx/m
delta_time = 0.01 #10 milliseconds
s0, v0 = 12, 0 #initial length 12 m, initial velocity 0 m/s
N=1000
s = np.zeros(N+1); v = s.copy(); a = s.copy()
a[0] = instantaneous_acceleration(s0) #initial acceleration when stretched to 12 m
v[0] = v0-a[0]*delta_time/2
s[0] = s0
for i in range(N):
v[i+1] = a[i]*delta_time + v[i] #calculate the next velocity
s[i+1] = v[i+1]*delta_time + s[i] #calculate the next position
a[i+1] = instantaneous_acceleration (s[i+1]) #use the position to derive the new acceleration
#produce plots of all these functions
t=np.arange(0,N+1)*delta_time;
fig, ax = plt.subplots(3,1,figsize=(5,3*1.5))
for g, y in zip(ax,(s,v,a)):
g.plot(t,y); g.grid();
plt.tight_layout(); plt.show();
This is obviously and correctly an oscillation. The exact solution is 12*cos(sqrt(0.5)*t), using it and its derivatives to compute the errors in the numerical solution (remember the leap-frogging of the velocities) gives via
w=0.5**0.5; dt=delta_time;
fig, ax = plt.subplots(3,1,figsize=(5,3*1.5))
for g, y in zip(ax,(s-12*np.cos(w*t),v+12*w*np.sin(w*(t-dt/2)),a+12*w**2*np.cos(w*t))):
g.plot(t,y); g.grid();
plt.tight_layout(); plt.show();
the plot below, showing errors in the expected size delta_time**2.
An analytical approach is the simplest way to obtain the velocity of a simple system that obeys Hooke's law.
However, if you desire a physically accurate numerical/iterative approach I strongly advise against methods like standard Euler or runge-kutta methods (suggested by Patrick87). [Correction: OPs method is a symplectic 1st order method, if the sign of acceleration term is corrected.]
You probably want to use a Hamiltonian approach and a suitable symplectic integrator such as the second order leapfrog (suggested also by Patrick87).
For Hookes law, you can express the Hamiltonian H = T(p) + V(q), where p is momentum (associated with velocity) and q is position (associated to how far the string located from equilibrium).
You have the kinetic energy T and potential energy V
T(p) = 0.5*p^2/m
V(q) = 0.5*k*q^2
You simply need the derivatives of these two expressions to simulate the system
dT/dp = p/m
dV/dq = k*q
I provided a detailed example (although for another 2-dimensional system), including an implementation of 1st and a 4th order method here:
https://zymplectic.com/case3.html under method 0 and method 1
These are symplectic integrators, which have an energy-preserving property that means you can perform long simulation without dissipative errors.
I am having trouble understanding the output of my function to implement multiple-ridge regression. I am doing this from scratch in Python for the closed form of the method. This closed form is shown below:
I have a training set X that is 100 rows x 10 columns and a vector y that is 100x1.
My attempt is as follows:
def ridgeRegression(xMatrix, yVector, lambdaRange):
wList = []
for i in range(1, lambdaRange+1):
lambVal = i
# compute the inner values (X.T X + lambda I)
xTranspose = np.transpose(x)
xTx = xTranspose # x
lamb_I = lambVal * np.eye(xTx.shape[0])
# invert inner, e.g. (inner)**(-1)
inner_matInv = np.linalg.inv(xTx + lamb_I)
# compute outer (X.T y)
outer_xTy = np.dot(xTranspose, y)
# multiply together
w = inner_matInv # outer_xTy
wList.append(w)
print(wList)
For testing, I am running it with the first 5 lambda values.
wList becomes 5 numpy.arrays each of length 10 (I'm assuming for the 10 coefficients).
Here is the first of those 5 arrays:
array([ 0.29686755, 1.48420319, 0.36388528, 0.70324668, -0.51604451,
2.39045735, 1.45295857, 2.21437745, 0.98222546, 0.86124358])
My question, and clarification:
Shouldn't there be 11 coefficients, (1 for the y-intercept + 10 slopes)?
How do I get the Minimum Square Error from this computation?
What comes next if I wanted to plot this line?
I think I am just really confused as to what I'm looking at, since I'm still working on my linear-algebra.
Thanks!
First, I would modify your ridge regression to look like the following:
import numpy as np
def ridgeRegression(X, y, lambdaRange):
wList = []
# Get normal form of `X`
A = X.T # X
# Get Identity matrix
I = np.eye(A.shape[0])
# Get right hand side
c = X.T # y
for lambVal in range(1, lambdaRange+1):
# Set up equations Bw = c
lamb_I = lambVal * I
B = A + lamb_I
# Solve for w
w = np.linalg.solve(B,c)
wList.append(w)
return wList
Notice that I replaced your inv call to compute the matrix inverse with an implicit solve. This is much more numerically stable, which is an important consideration for these types of problems especially.
I've also taken the A=X.T#X computation, identity matrix I generation, and right hand side vector c=X.T#y computation out of the loop--these don't change within the loop and are relatively expensive to compute.
As was pointed out by #qwr, the number of columns of X will determine the number of coefficients you have. You have not described your model, so it's not clear how the underlying domain, x, is structured into X.
Traditionally, one might use polynomial regression, in which case X is the Vandermonde Matrix. In that case, the first coefficient would be associated with the y-intercept. However, based on the context of your question, you seem to be interested in multivariate linear regression. In any case, the model needs to be clearly defined. Once it is, then the returned weights may be used to further analyze your data.
Typically to make notation more compact, the matrix X contains a column of ones for an intercept, so if you have p predictors, the matrix is dimensions n by p+1. See Wikipedia article on linear regression for an example.
To compute in-sample MSE, use the definition for MSE: the average of squared residuals. To compute generalization error, you need cross-validation.
Also, you shouldn't take lambVal as an integer. It can be small (close to 0) if the aim is just to avoid numerical error when xTx is ill-conditionned.
I would advise you to use a logarithmic range instead of a linear one, starting from 0.001 and going up to 100 or more if you want to. For instance you can change your code to that:
powerMin = -3
powerMax = 3
for i in range(powerMin, powerMax):
lambVal = 10**i
print(lambVal)
And then you can try a smaller range or a linear range once you figure out what is the correct order of lambVal with your data from cross-validation.
Suppose I want to find the "intersection point" of 2 arbitrary high-dimensional lines. The two lines won't actually intersect, but I still want to find the most intersect point (i.e. a point that is as close to all lines as possible).
Suppose those lines have direction vectors A, B and initial points C, D,
I can find the most intersect point by simply set up a linear least square problem: converting the line-intersection equation
Ax + C = By + D
to least-square form
[A, -B] # [[x, y]] = D - C
where # standards for matrix times vector, and then I can use e.g. np.linalg.lstsq to solve it.
But how can I find the "most intersect point" of 3 or more arbitrary lines? If I follow the same rule, I now have
Ax + D = By + E = Cz + F
The only way I can think of is decomposing this into three equations:
Ax + D = By + E
Ax + D = Cz + F
By + E = Cz + F
and converting them to least-square form
[A, -B, 0] [E - D]
[A, 0, -C] # [[x, y, z]] = [F - D]
[0, B, -C] [F - E]
The problem is the size of the least-square problem increases quadraticly about the number of lines. I'm wondering are there more efficient way to solve n-way-equal least-square linear problem?
I was thinking about the necessity of By + E = Cz + F above providing the other two terms. But since this problem do not have exact solution (i.e. they don't actually intersect), I believe doing so will create more "weight" on some variable?
Thank you for your help!
EDIT
I just tested pairing the first term with all other terms in the n-way-equality (and no other pairs) using the following code
def lineIntersect(k, b):
"k, b: N-by-D matrices describing N D-dimensional lines: k[i] * x + b[i]"
# Convert the problem to least-square form `Ax = B`
# A is temporarily defined 3-dimensional for convenience
A = np.zeros((len(k)-1, k.shape[1], len(k)), k.dtype)
A[:,:,0] = k[0]
A[range(len(k)-1),:,range(1,len(k))] = -k[1:]
# Convert to 2-dimensional matrix by flattening first two dimensions
A = A.reshape(-1, len(k))
# B should be 1-dimensional vector
B = (b[1:] - b[0]).ravel()
x = np.linalg.lstsq(A, B, None)[0]
return (x[:,None] * k + b).mean(0)
The result below indicates doing so is not correct because the first term in the n-way-equality is "weighted differently".
The first output is difference between the regular result and the result of different input order (line order should not matter) where the first term did not change.
The second output is the same with the first term did change.
k = np.random.rand(10, 100)
b = np.random.rand(10, 100)
print(np.linalg.norm(lineIntersect(k, b) - lineIntersect(np.r_[k[:1],k[:0:-1]], np.r_[b[:1],b[:0:-1]])))
print(np.linalg.norm(lineIntersect(k, b) - lineIntersect(k[::-1], b[::-1])))
results in
7.889616961715915e-16
0.10702479853076755
Another criterion for the 'almost intersection point' would be a point x such that the sum of the squares of the distances of x to the lines is as small as possible. Like your criterion, if the lines actually do intersect then the almost intersection point will be the actual intersection point. However I think the sum of distances squared criterion makes it straightforward to compute the point in question:
Suppose we represent a line by a point and a unit vector along the line. So if a line is represented by p,t then the points on the line are of the form
p + l*t for scalar l
The distance-squared of a point x from a line p,t is
(x-p)'*(x-p) - square( t'*(x-p))
If we have N lines p[i],t[i] then the sum of the distances squared from a point x is
Sum { (x-p[i])'*(x-p[i]) - square( t[i]'*(x[i]-p[i]))}
Expanding this out I get the above to be
x'*S*x - 2*x'*V + K
where
S = N*I - Sum{ t[i]*t[i]'}
V = Sum{ p[i] - (t[i]'*p[i])*t[i] }
and K does not depend on x
Unless all the lines are parallel, S will be (strictly) positive definite and hence invertible, and in that case our sum of distances squared is
(x-inv(S)*V)'*S*(x-inv(S)*V) + K - V'*inv(S)*V
Thus the minimising x is
inv(S)*V
So the drill is: normalise your 'direction vectors' (and scale each point by the same factor as used to scale the direction), form S and V as above, solve
S*x = V for x
This question might be better suited for the math stackexchange. Also, does anyone have a good way of formatting math here? Sorry that it's hard to read, I did my best with unicode.
EDIT: I misinterpreted what #ZisIsNotZis meant by the lines Ax+C so what disregard the next paragraph.
I'm not convinced that your method is stated correctly. Would you mind posting your code and a small example of the output (maybe in 2d with 3 or 4 lines so we can plot it)? When you're trying to find the intersection of two lines shouldn't you do Ax+C = Bx+D? If you do Ax+C=By+D you can pick some x on the first line and some y on the second line and satisfy both equations exactly. Because here x and y should be the same size as A and B which is the dimension of the space rather than scalars.
There are many ways to understand the problem of finding a point that is as close to all lines as possible. I think the most natural one is that the sum of squares of euclidian distance to each line is minimized.
Suppose we have a line in R^n: c^Tz + d = 0 (where c is unit length) and another point x. Then the shortest vector from x to the line is: (I-cc^T)(x-d) so the square of the distance from x to the line is ║(I-cc^T)(x-d)║^2. We can find the closest point to the line by minimizing this distance. Note that this is a standard least squares problem of the form min_x ║b-Ax║_2.
Now, suppose we have lines given by c_iz+d_i for i=1,...,m. The squared distance d_i^2 from a point x to the i-th line is d_i^2 = ║(I-cc^T)(x-d)║_2^2. We now want to solve the problem of min_x \sum_{i=1}^{m} d_i^2.
In matrix form we have:
║ ⎡ (I-c_1 c_1^T)(x-d_1) ⎤ ║
║ | (I-c_2 c_2^T)(x-d_2) | ║
min_x ║ | ... | ║
║ ⎣ (I-c_n c_n^T)(x-d_n) ⎦ ║_2
This is again in the form min_x ║b - Ax║_2 so there are good solvers available.
Each block has size n (dimension of the space) and there are m blocks (number of lines). So the system is mn byn. In particular, it is linear in the number of lines and quadratic in the dimension of the space.
It also has the advantage that if you add a line you simply add another block to the least squares system. This also offers the possibility of updating solutions iteratively as you add lines.
I'm not sure if there are special solvers for this type of least squares system. Note that each block is the identity minus a rank one matrix, so that might give some additional structure which can be used to speed things up. That said, I think using existing solvers will almost always work better than writing your own, unless you have quite a bit of background in numerical analysis or have a very specialized class of systems to solve.
Not a solution, some thoughts:
If line in nD space has parametric equation (with unit Dir vector)
L(t) = Base + Dir * t
then squared distance from point P to this line is
W = P - Base
Dist^2 = (W - (W.dot.Dir) * Dir)^2
If it is possible to write Min(Sum(Dist[i]^2)) in form suitable for LSQ method (make partial derivatives by every point coordinate), so resulting system might be solved for (x1..xn) coordinate vector.
(Situation resembles reversal of many points and single line of usual LSQ)
You say that you have two "high-dimensional" lines. This implies that the matrix indicating the lines has many more columns than rows.
If this is the case and you can efficiently find a low-rank decomposition such that A=LRᵀ, then you can rewrite the solution of the least squares problem min ||Ax-y||₂ as x=(Rᵀ RLᵀ L)⁻¹ Lᵀ y.
If m is the number of lines and n the dimension of the lines, then this reduces the least-squares time complexity from O(mn²+nʷ) to O(nr²+mr²) where r=min(m,n).
The problem then is to find such a decomposition.