Related
I'm implementing the PC algorithm in python. Such algorithm constructs the graphical model of a n-variate gaussian distribution. This graphical model is basically the skeleton of a directed acyclic graph, which means that if a structure like:
(x1)---(x2)---(x3)
Is in the graph, then x1 is independent by x3 given x2. More generally if A is the adjacency matrix of the graph and A(i,j)=A(j,i) = 0 (there is a missing edge between i and j) then i and j are conditionally independent, by all the variables that appear in any path from i to j. For statistical and machine learning purposes, it is be possible to "learn" the underlying graphical model.
If we have enough observations of a jointly gaussian n-variate random variable we could use the PC algorithm that works as follows:
given n as the number of variables observed, initialize the graph as G=K(n)
for each pair i,j of nodes:
if exists an edge e from i to j:
look for the neighbours of i
if j is in neighbours of i then remove j from the set of neighbours
call the set of neighbours k
TEST if i and j are independent given the set k, if TRUE:
remove the edge e from i to j
This algorithm computes also the separating set of the graph, that are used by another algorithm that constructs the dag starting from the skeleton and the separation set returned by the pc algorithm. This is what i've done so far:
def _core_pc_algorithm(a,sigma_inverse):
l = 0
N = len(sigma_inverse[0])
n = range(N)
sep_set = [ [set() for i in n] for j in n]
act_g = complete(N)
z = lambda m,i,j : -m[i][j]/((m[i][i]*m[j][j])**0.5)
while l<N:
for (i,j) in itertools.permutations(n,2):
adjacents_of_i = adj(i,act_g)
if j not in adjacents_of_i:
continue
else:
adjacents_of_i.remove(j)
if len(adjacents_of_i) >=l:
for k in itertools.combinations(adjacents_of_i,l):
if N-len(k)-3 < 0:
return (act_g,sep_set)
if test(sigma_inverse,z,i,j,l,a,k):
act_g[i][j] = 0
act_g[j][i] = 0
sep_set[i][j] |= set(k)
sep_set[j][i] |= set(k)
l = l + 1
return (act_g,sep_set)
a is the tuning-parameter alpha with which i will test for conditional independence, and sigma_inverse is the inverse of the covariance matrix of the sampled observations. Moreover, my test is:
def test(sigma_inverse,z,i,j,l,a,k):
def erfinv(x): #used to approximate the inverse of a gaussian cumulative density function
sgn = 1
a = 0.147
PI = numpy.pi
if x<0:
sgn = -1
temp = 2/(PI*a) + numpy.log(1-x**2)/2
add_1 = temp**2
add_2 = numpy.log(1-x**2)/a
add_3 = temp
rt1 = (add_1-add_2)**0.5
rtarg = rt1 - add_3
return sgn*(rtarg**0.5)
def indep_test_ijK(K): #compute partial correlation of i and j given ONE conditioning variable K
part_corr_coeff_ij = z(sigma_inverse,i,j) #this gives the partial correlation coefficient of i and j
part_corr_coeff_iK = z(sigma_inverse,i,K) #this gives the partial correlation coefficient of i and k
part_corr_coeff_jK = z(sigma_inverse,j,K) #this gives the partial correlation coefficient of j and k
part_corr_coeff_ijK = (part_corr_coeff_ij - part_corr_coeff_iK*part_corr_coeff_jK)/((((1-part_corr_coeff_iK**2))**0.5) * (((1-part_corr_coeff_jK**2))**0.5)) #this gives the partial correlation coefficient of i and j given K
return part_corr_coeff_ijK == 0 #i independent from j given K if partial_correlation(i,k)|K == 0 (under jointly gaussian assumption) [could check if abs is < alpha?]
def indep_test():
n = len(sigma_inverse[0])
phi = lambda p : (2**0.5)*erfinv(2*p-1)
root = (n-len(k)-3)**0.5
return root*abs(z(sigma_inverse,i,j)) <= phi(1-a/2)
if l == 0:
return z(sigma_inverse,i,j) == 0 #i independent from j <=> partial_correlation(i,j) == 0 (under jointly gaussian assumption) [could check if abs is < alpha?]
elif l == 1:
return indep_test_ijK(k[0])
elif l == 2:
return indep_test_ijK(k[0]) and indep_test_ijK(k[1]) #ASSUMING THAT IJ ARE INDEPENDENT GIVEN Y,Z <=> IJ INDEPENDENT GIVEN Y AND IJ INDEPENDENT GIVEN Z
else: #i have to use the independent test with the z-fisher function
return indep_test()
Where z is a lambda that receives a matrix (the inverse of the covariance matrix), an integer i, an integer j and it computes the partial correlation of i and j given all the rest of variables with the following rule (which I read in my teacher's slides):
corr(i,j)|REST = -var^-1(i,j)/sqrt(var^-1(i,i)*var^-1(j,j))
The main core of this application is the indep_test() function:
def indep_test():
n = len(sigma_inverse[0])
phi = lambda p : (2**0.5)*erfinv(2*p-1)
root = (n-len(k)-3)**0.5
return root*abs(z(sigma_inverse,i,j)) <= phi(1-a/2)
This function implements a statistical test which uses the fisher's z-transform of estimated partial correlations. I am using this algorithm in two ways:
Generate data from a linear regression model and compare the learned DAG with the expected one
Read a dataset and learn the underlying DAG
In both cases i do not always get correct results, either because I know the DAG underlying a certain dataset, or because i know the generative model but it does not coincide with the one my algorithm learns. I perfectly know that this is a non-trivial task and I may have misunderstand theoretical concept as well as committed error even in parts of the code i have omitted here; but first i'd like to know (from someone who is more experienced than me), if the test i wrote is right, and also if there are library functions that perform this kind of tests, i tried searching but i couldn't find any suitable function.
I get to the point. The most critical issue in the above code, regards the following error:
sqrt(n-len(k)-3)*abs(z(sigma_inverse[i][j])) <= phi(1-alpha/2)
I was mistaking the mean of n, it is not the size of the precision matrix but the number of total multi-variate observations (in my case, 10000 instead of 5). Another wrong assumption is that z(sigma_inverse[i][j]) has to provide the partial correlation of i and j given all the rest. That's not correct, z is the Fisher's transform on a proper subset of the precision matrix which estimates the partial correlation of i and j given the K. The correct test is the following:
if len(K) == 0: #CM is the correlation matrix, we have no variables conditioning (K has 0 length)
r = CM[i, j] #r is the partial correlation of i and j
elif len(K) == 1: #we have one variable conditioning, not very different from the previous version except for the fact that i have not to compute the correlations matrix since i start from it, and pandas provide such a feature on a DataFrame
r = (CM[i, j] - CM[i, K] * CM[j, K]) / math.sqrt((1 - math.pow(CM[j, K], 2)) * (1 - math.pow(CM[i, K], 2))) #r is the partial correlation of i and j given K
else: #more than one conditioning variable
CM_SUBSET = CM[np.ix_([i]+[j]+K, [i]+[j]+K)] #subset of the correlation matrix i'm looking for
PM_SUBSET = np.linalg.pinv(CM_SUBSET) #constructing the precision matrix of the given subset
r = -1 * PM_SUBSET[0, 1] / math.sqrt(abs(PM_SUBSET[0, 0] * PM_SUBSET[1, 1]))
r = min(0.999999, max(-0.999999,r))
res = math.sqrt(n - len(K) - 3) * 0.5 * math.log1p((2*r)/(1-r)) #estimating partial correlation with fisher's transofrmation
return 2 * (1 - norm.cdf(abs(res))) #obtaining p-value
I hope someone could find this helpful
I have been trying to create custom calculator for calculating trigonometric functions. Aside from Chebyshev pylonomials and/or Cordic algorithm I have used Taylor series which have been accurate by few places of decimal.
This is what i have created to calculate simple trigonometric functions without any modules:
from __future__ import division
def sqrt(n):
ans = n ** 0.5
return ans
def factorial(n):
k = 1
for i in range(1, n+1):
k = i * k
return k
def sin(d):
pi = 3.14159265359
n = 180 / int(d) # 180 degrees = pi radians
x = pi / n # Converting degrees to radians
ans = x - ( x ** 3 / factorial(3) ) + ( x ** 5 / factorial(5) ) - ( x ** 7 / factorial(7) ) + ( x ** 9 / factorial(9) )
return ans
def cos(d):
pi = 3.14159265359
n = 180 / int(d)
x = pi / n
ans = 1 - ( x ** 2 / factorial(2) ) + ( x ** 4 / factorial(4) ) - ( x ** 6 / factorial(6) ) + ( x ** 8 / factorial(8) )
return ans
def tan(d):
ans = sin(d) / sqrt(1 - sin(d) ** 2)
return ans
Unfortunately i could not find any sources that would help me interpret inverse trigonometric function formulas for Python. I have also tried putting sin(x) to the power of -1 (sin(x) ** -1) which didn't work as expected.
What could be the best solution to do this in Python (In the best, I mean simplest with similar accuracy as Taylor series)? Is this possible with power series or do i need to use cordic algorithm?
The question is broad in scope, but here are some simple ideas (and code!) that might serve as a starting point for computing arctan. First, the good old Taylor series. For simplicity, we use a fixed number of terms; in practice, you might want to decide the number of terms to use dynamically based on the size of x, or introduce some kind of convergence criterion. With a fixed number of terms, we can evaluate efficiently using something akin to Horner's scheme.
def arctan_taylor(x, terms=9):
"""
Compute arctan for small x via Taylor polynomials.
Uses a fixed number of terms. The default of 9 should give good results for
abs(x) < 0.1. Results will become poorer as abs(x) increases, becoming
unusable as abs(x) approaches 1.0 (the radius of convergence of the
series).
"""
# Uses Horner's method for evaluation.
t = 0.0
for n in range(2*terms-1, 0, -2):
t = 1.0/n - x*x*t
return x * t
The above code gives good results for small x (say smaller than 0.1 in absolute value), but the accuracy drops off as x becomes larger, and for abs(x) > 1.0, the series never converges, no matter how many terms (or how much extra precision) we throw at it. So we need a better way to compute for larger x. One solution is to use argument reduction, via the identity arctan(x) = 2 * arctan(x / (1 + sqrt(1 + x^2))). This gives the following code, which builds on arctan_taylor to give reasonable results for a wide range of x (but beware possible overflow and underflow when computing x*x).
import math
def arctan_taylor_with_reduction(x, terms=9, threshold=0.1):
"""
Compute arctan via argument reduction and Taylor series.
Applies reduction steps until x is below `threshold`,
then uses Taylor series.
"""
reductions = 0
while abs(x) > threshold:
x = x / (1 + math.sqrt(1 + x*x))
reductions += 1
return arctan_taylor(x, terms=terms) * 2**reductions
Alternatively, given an existing implementation for tan, you could simply find a solution y to the equation tan(y) = x using traditional root-finding methods. Since arctan is already naturally bounded to lie in the interval (-pi/2, pi/2), bisection search works well:
def arctan_from_tan(x, tolerance=1e-15):
"""
Compute arctan as the inverse of tan, via bisection search. This assumes
that you already have a high quality tan function.
"""
low, high = -0.5 * math.pi, 0.5 * math.pi
while high - low > tolerance:
mid = 0.5 * (low + high)
if math.tan(mid) < x:
low = mid
else:
high = mid
return 0.5 * (low + high)
Finally, just for fun, here's a CORDIC-like implementation, which is really more appropriate for a low-level implementation than for Python. The idea here is that you precompute, once and for all, a table of arctan values for 1, 1/2, 1/4, etc., and then use those to compute general arctan values, essentially by computing successive approximations to the true angle. The remarkable part is that, after the precomputation step, the arctan computation involves only additions, subtractions, and multiplications by by powers of 2. (Of course, those multiplications aren't any more efficient than any other multiplication at the level of Python, but closer to the hardware, this could potentially make a big difference.)
cordic_table_size = 60
cordic_table = [(2**-i, math.atan(2**-i))
for i in range(cordic_table_size)]
def arctan_cordic(y, x=1.0):
"""
Compute arctan(y/x), assuming x positive, via CORDIC-like method.
"""
r = 0.0
for t, a in cordic_table:
if y < 0:
r, x, y = r - a, x - t*y, y + t*x
else:
r, x, y = r + a, x + t*y, y - t*x
return r
Each of the above methods has its strengths and weaknesses, and all of the above code can be improved in a myriad of ways. I encourage you to experiment and explore.
To wrap it all up, here are the results of calling the above functions on a small number of not-very-carefully-chosen test values, comparing with the output of the standard library math.atan function:
test_values = [2.314, 0.0123, -0.56, 168.9]
for value in test_values:
print("{:20.15g} {:20.15g} {:20.15g} {:20.15g}".format(
math.atan(value),
arctan_taylor_with_reduction(value),
arctan_from_tan(value),
arctan_cordic(value),
))
Output on my machine:
1.16288340166519 1.16288340166519 1.16288340166519 1.16288340166519
0.0122993797673 0.0122993797673 0.0122993797673002 0.0122993797672999
-0.510488321916776 -0.510488321916776 -0.510488321916776 -0.510488321916776
1.56487573286064 1.56487573286064 1.56487573286064 1.56487573286064
The simplest way to do any inverse function is to use binary search.
definitions
let assume function
x = g(y)
And we want to code its inverse:
y = f(x) = f(g(y))
x = <x0,x1>
y = <y0,y1>
bin search on floats
You can do it on integer math accessing mantissa bits like in here:
Any Faster RMS Value Calculation in C?
but if you do not know the exponent of the result prior to computation then you need to use floats for bin search too.
so the idea behind binary search is to change mantissa of y from y1 to y0 bit by bit from MSB to LSB. Then call direct function g(y) and if the result cross x revert the last bit change.
In case of using floats you can use variable that will hold approximate value of the mantissa bit targeted instead of integer bit access. That will eliminate unknown exponent problem. So at the beginning set y = y0 and actual bit to MSB value so b=(y1-y0)/2. After each iteration halve it and do as many iterations as you got mantissa bits n... This way you obtain result in n iterations within (y1-y0)/2^n accuracy.
If your inverse function is not monotonic break it into monotonic intervals and handle each as separate binary search.
The function increasing/decreasing just determine the crossing condition direction (use of < or >).
C++ acos example
so y = acos(x) is defined on x = <-1,+1> , y = <0,M_PI> and decreasing so:
double f64_acos(double x)
{
const int n=52; // mantisa bits
double y,y0,b;
int i;
// handle domain error
if (x<-1.0) return 0;
if (x>+1.0) return 0;
// x = <-1,+1> , y = <0,M_PI> , decreasing
for (y= 0.0,b=0.5*M_PI,i=0;i<n;i++,b*=0.5) // y is min, b is half of max and halving each iteration
{
y0=y; // remember original y
y+=b; // try set "bit"
if (cos(y)<x) y=y0; // if result cross x return to original y decreasing is < and increasing is >
}
return y;
}
I tested it like this:
double x0,x1,y;
for (x0=0.0;x0<M_PI;x0+=M_PI*0.01) // cycle all angle range <0,M_PI>
{
y=cos(x0); // direct function (from math.h)
x1=f64_acos(y); // my inverse function
if (fabs(x1-x0)>1e-9) // check result and output to log if error
Form1->mm_log->Lines->Add(AnsiString().sprintf("acos(%8.3lf) = %8.3lf != %8.3lf",y,x0,x1));
}
Without any difference found... so the implementation is working correctly. Of coarse binary search on 52 bit mantissa is usually slower then polynomial approximation ... on the other hand the implementation is so simple ...
[Notes]
If you do not want to take care of the monotonic intervals you can try
approximation search
As you are dealing with goniometric functions you need to handle singularities to avoid NaN or division by zero etc ...
If you're interested here more bin search examples (mostly on integers)
Power by squaring for negative exponents it contains
i'm calculating Gini coefficient (similar to: Python - Gini coefficient calculation using Numpy) but i get an odd result. for a uniform distribution sampled from np.random.rand(), the Gini coefficient is 0.3 but I would have expected it to be close to 0 (perfect equality). what is going wrong here?
def G(v):
bins = np.linspace(0., 100., 11)
total = float(np.sum(v))
yvals = []
for b in bins:
bin_vals = v[v <= np.percentile(v, b)]
bin_fraction = (np.sum(bin_vals) / total) * 100.0
yvals.append(bin_fraction)
# perfect equality area
pe_area = np.trapz(bins, x=bins)
# lorenz area
lorenz_area = np.trapz(yvals, x=bins)
gini_val = (pe_area - lorenz_area) / float(pe_area)
return bins, yvals, gini_val
v = np.random.rand(500)
bins, result, gini_val = G(v)
plt.figure()
plt.subplot(2, 1, 1)
plt.plot(bins, result, label="observed")
plt.plot(bins, bins, '--', label="perfect eq.")
plt.xlabel("fraction of population")
plt.ylabel("fraction of wealth")
plt.title("GINI: %.4f" %(gini_val))
plt.legend()
plt.subplot(2, 1, 2)
plt.hist(v, bins=20)
for the given set of numbers, the above code calculates the fraction of the total distribution's values that are in each percentile bin.
the result:
uniform distributions should be near "perfect equality" so the lorenz curve bending is off.
This is to be expected. A random sample from a uniform distribution does not result in uniform values (i.e. values that are all relatively close to each other). With a little calculus, it can be shown that the expected value (in the statistical sense) of the Gini coefficient of a sample from the uniform distribution on [0, 1] is 1/3, so getting values around 1/3 for a given sample is reasonable.
You'll get a lower Gini coefficient with a sample such as v = 10 + np.random.rand(500). Those values are all close to 10.5; the relative variation is lower than the sample v = np.random.rand(500).
In fact, the expected value of the Gini coefficient for the sample base + np.random.rand(n) is 1/(6*base + 3).
Here's a simple implementation of the Gini coefficient. It uses the fact that the Gini coefficient is half the relative mean absolute difference.
def gini(x):
# (Warning: This is a concise implementation, but it is O(n**2)
# in time and memory, where n = len(x). *Don't* pass in huge
# samples!)
# Mean absolute difference
mad = np.abs(np.subtract.outer(x, x)).mean()
# Relative mean absolute difference
rmad = mad/np.mean(x)
# Gini coefficient
g = 0.5 * rmad
return g
(For some more efficient implementations, see More efficient weighted Gini coefficient in Python)
Here's the Gini coefficient for several samples of the form v = base + np.random.rand(500):
In [80]: v = np.random.rand(500)
In [81]: gini(v)
Out[81]: 0.32760618249832563
In [82]: v = 1 + np.random.rand(500)
In [83]: gini(v)
Out[83]: 0.11121487509454202
In [84]: v = 10 + np.random.rand(500)
In [85]: gini(v)
Out[85]: 0.01567937753659053
In [86]: v = 100 + np.random.rand(500)
In [87]: gini(v)
Out[87]: 0.0016594595244509495
A slightly faster implementation (using numpy vectorization and only computing each difference once):
def gini_coefficient(x):
"""Compute Gini coefficient of array of values"""
diffsum = 0
for i, xi in enumerate(x[:-1], 1):
diffsum += np.sum(np.abs(xi - x[i:]))
return diffsum / (len(x)**2 * np.mean(x))
Note: x must be a numpy array.
Gini coefficient is the area under the Lorence curve, usually calculated for analyzing the distribution of income in population. https://github.com/oliviaguest/gini provides simple implementation for the same using python.
A quick note on the original methodology:
When calculating Gini coefficients directly from areas under curves with np.traps or another integration method, the first value of the Lorenz curve needs to be 0 so that the area between the origin and the second value is accounted for. The following changes to G(v) fix this:
yvals = [0]
for b in bins[1:]:
I also discussed this issue in this answer, where including the origin in those calculations provides an equivalent answer to using the other methods discussed here (which do not need 0 to be appended).
In short, when calculating Gini coefficients directly using integration, start from the origin. If using the other methods discussed here, then it's not needed.
Note that gini index is currently present in skbio.diversity.alpha as gini_index. It might give a bit different result with examples mentioned above.
You are getting the right answer. The Gini Coefficient of the uniform distribution is not 0 "perfect equality", but (b-a) / (3*(b+a)). In your case, b = 1, and a = 0, so Gini = 1/3.
The only distributions with perfect equality are the Kroneker and the Dirac deltas. Remember that equality means "all the same", not "all equally probable".
There were some issues with the previous implementations. They never gave the gini index = 1 for perfectly sparse data.
example:
def gini_coefficient(x):
"""Compute Gini coefficient of array of values"""
diffsum = 0
for i, xi in enumerate(x[:-1], 1):
diffsum += np.sum(np.abs(xi - x[i:]))
return diffsum / (len(x)**2 * np.mean(x))
gini_coefficient(np.array([0, 0, 1]))
gives the answer 0.666666. That happens because of the implied "integration scheme" it uses.
Here is another variant that bypasses the issue, although it is computationally heavier:
import numpy as np
from scipy.interpolate import interp1d
def gini(v, n_new = 1000):
"""Compute Gini coefficient of array of values"""
v_abs = np.sort(np.abs(v))
cumsum_v = np.cumsum(v_abs)
n = len(v_abs)
vals = np.concatenate([[0], cumsum_v/cumsum_v[-1]])
x = np.linspace(0, 1, n+1)
f = interp1d(x=x, y=vals, kind='previous')
xnew = np.linspace(0, 1, n_new+1)
dx_new = 1/(n_new)
vals_new = f(xnew)
return 1 - 2 * np.trapz(y=vals_new, x=xnew, dx=dx_new)
gini(np.array([0, 0, 1]))
it gives 0.999 output, which is closer to what one wants to have =)
I have 10k data points like this:
0.010222
0.010345
0.010465
0.010611
0.010768
0.010890
0.011049
0.011206
0.011329
0.011465
0.011613
0.11763
0.011888
0.012015
0.012154
0.012282
0.012408
0.012524
....
I want to calculate Lyapunov exponent for that. This is what I've done so far:
lyapunovs = []
eps = 0.0001
for i in range(N):
for j in range(i + 1, N):
if np.abs(data[i] - data[j]) < eps:
for k in range(1, min(N - i, N - j)):
d0 = np.abs(data[i] - data[j])
dn = np.abs(data[i + k] - data[j + k])
lyapunovs.append(math.log(dn) - math.log(d0)) # problem
My problem is that I don't know first Lyapunov exponent is average of all the lyapunovs when k = 1 or average of all the lyapunovs for the first time that data[i] - data[j] < eps?
Is this right implementation for Lyapunov exponent?
And this is the Numerical Calculation of Lyapunov Exponent
I would calculate the Lyapunov Exponent in this way and then output the results as tuples in a file see blog:
https://blog.abhranil.net/2014/07/22/calculating-the-lyapunov-exponent-of-a-time-series-with-python-code/:
from math import log
import numpy as np
with open('data.txt', 'r') as f:
data = [float(i) for i in f.read().split()]
N = len(data)
eps = 0.001
lyapunovs = [[] for i in range(N)]
for i in range(N):
for j in range(i + 1, N):
if np.abs(data[i] - data[j]) < eps:
for k in range(min(N - i, N - j)):
lyapunovs[k].append(log(np.abs(data[i+k] - data[j+k])))
with open('lyapunov.txt', 'w') as f:
for i in range(len(lyapunovs)):
if len(lyapunovs[i]):
string = str((i, sum(lyapunovs[i]) / len(lyapunovs[i])))
f.write(string + '\n')
I see from the chosen loop structure in the question that a triangle of the Cartesian product of the points is being used. This might improve the estimate of the derivatives, which are susceptible to noise, but it is not part of the Lyapunov exponent explicitly. See this example of the calculations on a known function in the absence of measurement error. Feel free to look into that aspect more, but below I will assume the comparison of signal points adjacent in time.
Your original question uses NumPy, so I will also make use of it. One of the rules of thumb to using NumPy well is to avoid loops, although it is possible to vectorize functions that contain loops. With no explicit time measurements, and no repeated values, you could simply do:
import numpy as np
x = np.random.normal(0,1,size=10**4) # Mock signal data
np.mean(np.log(np.abs(np.diff(x))))
Or if the signal is paired with an array of timepoints, then the numerical derivative can involve time:
import numpy as np
x = np.random.normal(0,1,size=10**4) # Mock signal data
t = np.arange(10**4) # Mock time data
np.mean(np.log(np.abs(np.diff(x) / np.diff(t))))
However, in some datasets it is possible for adjacent values to repeat! This can occur when you've measured the signal only to a few decimal places, and it is a problem because it leads to np.log(0) (=-np.inf) which will blow up your calculation. A simple solution is to remove duplicated values, but this will only be suitable if duplicates are relatively rare and you have a large sample size. It is possible to estimate an upper bound on the estimate of the L-exponent by considering the precision of your measurements, but that is not the estimate of the L-exponent itself.
I just want to mention that knowing the literal expression is the best.
I will take an example with the logistic map equation :
def logisticmap(x_init, r, length):
x = [x_init]
for t in range(length):
x.append(r*x[-1]*(1-x[-1]))
return np.array(x)
Now let's generate the data :
x = logistic(0.2, 3.92, 1000)
plt.plot(x)
plt.show()
Plot logistic map
Here is the proposed solution by Galan,
np.mean(np.log(abs(np.diff(x))))
Which gives : -1.0379
When you derive the Lyapunov exponent from the logistic map equation :
np.mean(np.log(abs(r*(1-2*x))))
It gives : 0.538296
Which is the actual true value for the Lyapunov, since the system is in its chaotic regime it must be positive, so I guess the evaluation from data points is not working in this example, you can try with more data points, but it will still give you a negative LE.
Unfortunately I don't know enough to guide you towards a better estimation for the Lyapunov if you can't derive a mathematical expression, but I would be intersted to know !
I tried to reduce computational complexity with numpy vectorization.
def lyapunov_exponent(series: np.array, threshold: float): -> np.array
N = len(series)
eps = threshold
L = [np.array([0]*N)]
for i in range(1, N):
diff = np.abs(series[i:]-series[:-i])
dist = np.log(diff)
L.append(np.concatenate([[0]*i, dist]))
L = np.array(L)
tf_L = np.where(L<eps, 1, 0)
count_L = np.zeros_like(tf_L)
for i in range(N):
indices = ( np.array(range(0,N-i)), np.array(range(i,N)) )
count_L[indices] = np.cumsum(tf_L[indices])
avg = np.sum(count_L * L, axis=0) / np.sum(count_L, axis=0)
return avg
If there is room for improvement or you get some different result than already answered, please reply.
Does anybody know a simple way to implement a recursive least squares function in Python?
I want a fast way to regress out a linear drift ([1 2 ... n], where n is the number of time points up until now) from my incoming signal every time it updates. RLS is typically what is used to do this, because the computing time does not increase as the number of time points increase.
The RLS algorithm is implemented in Python Padasip library. You can check the code on github: Padasip source codes
Or you can use directly the library. See documentation for Padasip RLS algorithm
The least squares fit of a line to data t[], x[] is given by
x = xbar + (C/V)*(t-tbar)
where
xbar = Sum{ x[i]} / N
tbar = sum{ t[i]} / N
V = Sum{ (t[i]-tbar)^2 } / N
C = Sum{ (x[i]-xbar)*(t[i]-tbar) } / N
You can compute xbar,tbar,V and C incrementally like this:
Initially
N = 0
xbar = tbar = C = V = 0
Incorporating data t,x:
N += 1
f = 1.0/N
dx = x - xbar
dt = t - tbar
xbar += f*dx
tbar += f*dt
V = (1.0-f)*(V + f*dt*dt)
C = (1.0-f)*(C + f*dx*dt)
Note that until you have at least two data points V will be zero, and so there is no line. Note also that each x[] could be a vector; as long as xbar and C are also computed as vectors the same formulae work.