math domain error while using PCA

math domain error while using PCA - python

I am using python's scikit-learn package to implement PCA .I am getting math
domain error :
C:\Users\Akshenndra\Anaconda2\lib\site-packages\sklearn\decomposition\pca.pyc in _assess_dimension_(spectrum, rank, n_samples, n_features)
78 for j in range(i + 1, len(spectrum)):
79 pa += log((spectrum[i] - spectrum[j]) *
---> 80 (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)
81
82 ll = pu + pl + pv + pp - pa / 2. - rank * log(n_samples) / 2.
ValueError: math domain error
I already know that math domain error is caused when we take logarithm of a negative number ,but I don't understand here how can there be a negative number inside the logarithm ? because this code works fine for other datasets.
maybe is this related to what is written in the sci-kitlearn's website -"This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data."(there are large number of 0 values)

I think you should add 1 instead, as the numpy log1p description page.
Since log(p+1) = 0 when p = 0 (while log(e-99) = -99), and as the quote in the link
For real-valued input, log1p is accurate also for x so small that 1 + x == 1 in floating-point accuracy
The code can be modified as follows to make what you trying to resolve more reasonable:
for i in range(rank):
for j in range(i + 1, len(spectrum)):
pa += log((spectrum[i] - spectrum[j]) *
(1. / spectrum_[j] - 1. / spectrum_[i]) + 1) + log(n_samples + 1)
ll = pu + pl + pv + pp - pa / 2. - rank * log(n_samples + 1) / 2

I don't know whether i am right or not, but I truly find a way to solve it.
I just print some error information(The value of spectrum_[i] and spectrum_[j]), and I find :
sometimes, they are same!!!
(Maybe they are not same but they are too close, I guess)
so , here
pa += log((spectrum[i] - spectrum[j]) *
(1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)
it will report error when calculate log(0).
My way to solve it is to add a very small number 1e-99 to 0, so it become log(0 + 1e-99)
so you can just change it to:
pa += log((spectrum[i] - spectrum[j]) *
(1. / spectrum_[j] - 1. / spectrum_[i]) + 1e-99) + log(n_samples)

Related

More precision in numpy arrays

I have the following simple function to evaluate.
def f0(wt):
term1 = (1 + np.cos(wt)**2) * (1 / 3 - 2 / (wt)**2)
term2 = np.sin(wt)**2
term3 = 4 / (wt)**3 * np.cos(wt) * np.sin(wt)
return 0.5 * (term1 + term2 + term3)
For small values of wt (order of 1e-4 and below), I seem to have numerical problems in the evaluation of the function. Indeed, the term1 and term3 have very large and almost opposite values, but term2 is very small.
I think I improved things slightly by splitting the sum of the 3 terms into two parts, as showed here
def f1(wt):
# Split the calculation to have more stability hopefully
term1 = (1 + np.cos(wt)**2) * (1 / 3 - 2 / (wt)**2)
term2 = np.sin(wt)**2
term3 = 4 / (wt)**3 * np.cos(wt) * np.sin(wt)
partial = term1 + term3
return 0.5 * (partial + term2)
However, for very small but positive values of wt, I think there are still numerical problems. I expect this function to be smooth for any positive value of wt, but, as you can see from the plot attached, at values below 1e-3, there are wild artifacts.
My question is: how can I improve the numerical precision of Numpy, if I am already using the data type float64?
Note: I am on a Windows 10 machine with 64 bits. I have read on other Stack Overflow threads that the class np.float128 is not available.
Full code snippet
import numpy as np
import matplotlib.pyplot as plt
wt = np.logspace(-6, 1, 1000)
def f0(wt):
term1 = (1 + np.cos(wt)**2) * (1 / 3 - 2 / (wt)**2)
term2 = np.sin(wt)**2
term3 = 4 / (wt)**3 * np.cos(wt) * np.sin(wt)
return 0.5 * (term1 + term2 + term3)
def f1(wt):
# Split the calculation to have more stability hopefully
term1 = (1 + np.cos(wt)**2) * (1 / 3 - 2 / (wt)**2)
term2 = np.sin(wt)**2
term3 = 4 / (wt)**3 * np.cos(wt) * np.sin(wt)
partial = term1 + term3
return 0.5 * (partial + term2)
plt.figure()
plt.loglog(wt, f0(wt), label='f0')
plt.loglog(wt, f1(wt), label='f1')
plt.grid()
plt.legend()
plt.xlabel('wt')
plt.show()

How about you replace the sin and cosin with the first few terms of their Taylor series. Then sympy is able to give you a simple result that is hopefully better suited numerically.
First I slightly change your function so it gives me a sympy expression.
from sympy import *
t = symbols('t')
def f0(wt):
term1 = (1 + sympy.cos(wt)**2) * (sympy.Rational(1,3) - 2 / (wt)**2)
term2 = sympy.sin(wt)**2
term3 = 4 / (wt)**3 * sympy.cos(wt) * sympy.sin(wt)
return sympy.Rational(1,2)*(term1 + term2 + term3)
expr = f0(t)
expr
Now I replace sin and cos with their taylor polynomials.
def taylor(f, n):
return sum(t**i/factorial(i) * f(t).diff(t, i).subs(t,0) for i in range(n))
tsin = taylor(sin, 7)
tcos = taylor(cos, 7)
expr2 = simplify(expr.subs(sin(t),tsin).subs(cos(t),tcos))
f1 = lambdify(t, expr2, 'numpy')
expr2
And finally I plot it using exactly your code. Notice that I am using sympys option to make a numpy ufunc.
wt = np.logspace(-6, 1, 1000)
plt.figure()
plt.loglog(wt, f0(wt), label='f0')
plt.loglog(wt, f1(wt), label='f1')
plt.grid()
plt.legend()
plt.xlabel('wt')
plt.show()
Obviously this function is only good around zero and for values between 1 and 10 you should take the original function. But in case you need convincing and don't care that the function with replaced taylor polynomial looks nice you can crank the degree up to 25 making it visually agree with your function at least up until 10.
And you can combine the functions so it calculates the values around zero with my function and the other with yours like this.
def f2(wt):
cond = np.abs(wt) > 1/10
return np.piecewise(wt, [cond, ~cond], [f0,f1])

The problem you are facing is catastrophic cancellation and it must not be solved using higher precision as doing so will generally postpone the actual problem. The root of the problem which is a numerical instability must be solved by reformulating the mathematical expression.
Note that f1 is a bit better than f0 but the cancellation issue lies in term1 + term3.
By transforming the expression simple development/factorization operations and using trigonometric identities one can get the following function:
def f2(wt):
sw = np.sin(wt)
sw2 = np.sin(2*wt)
return (sw/wt)**2 + 1/3 + (sw2 / wt - 2) / wt**2 + sw**2 / 3
This function is a bit more accurate but still contains a cancellation causing the same issue. This happens because of the expression E = (sw2 / wt - 2) / wt**2 which is the root of the problem. Indeed, np.sin(2*wt) tends towards 2 when wt is near 0. Thus sw2 / wt - 2 is close to 0 and the expression E is numerically unstable because of a close-to-zero value divided by another close-to-zero value. If one can reformulate analytically E to remove the singularity, then the resulting expression will likely be numerically stable. For more information you can look at the sinc function and how to compute an approximation of this function (also available in Numpy).
One simple way to solve this is to use numerical tools like Taylor series. Taylor series can approximate the expression of E close to zero accurately (because of its derivatives). Actually, one can use Taylor series to compute the whole expression and not only E. However, using Taylor series for values close to 1 give inaccurate results. In fact, the accuracy of the method drops very quickly above 1. One solution is to only use the Taylor series for small values.
Here is the resulting implementation:
def f3(wt):
sw = np.sin(wt)
sw2 = np.sin(2*wt)
reference = (sw/wt)**2 + 1/3 + (sw2 / wt - 2) / wt**2 + sw**2 / 3
# O(13) Taylor series computation used only for near-zero values
taylor = ( ( 4. / 15.) * wt**2
- ( 29. / 315.) * wt**4
+ ( 37. / 2835.) * wt**6
- (151. / 155925.) * wt**8
+ (268. / 6081075.) * wt**10
- (866. / 638512875.) * wt**12)
# Select the best implementation
return np.where(np.logical_and(wt >= -0.2, wt <= 0.2), taylor, reference)
This implementation appear to be very accurate in practice (>=12 digits of precision) while being still relatively fast. Here is the result:

How do I simplify the sum of sine and cosine in SymPy?

How do I simplify a*sin(wt) + b*cos(wt) into c*sin(wt+theta) using SymPy? For example:
f = sin(t) + 2*cos(t) = 2.236*sin(t + 1.107)
I tried the following:
from sympy import *
t = symbols('t')
f=sin(t)+2*cos(t)
trigsimp(f) #Returns sin(t)+2*cos(t)
simplify(f) #Returns sin(t)+2*cos(t)
f.rewrite(sin) #Returns sin(t)+2*sin(t+Pi/2)
PS.: I dont have direct access to a,b and w. Only to f
Any suggestion?

The general answer can be achieved by noting that you want to have
a * sin(t) + b * cos(t) = A * (cos(c)*sin(t) + sin(c)*cos(t))
This leads to a simultaneous equation a = A * cos(c) and b = A * sin(c).
Dividing the second equation by the second, we can solve for c. Substituting its solution into the first equation, you can solve for A.
I followed the same pattern but just to get it in terms of cos. If you want to get it in terms of sin, you can use Rodrigo's formula.
The following code should be able to take any linear combination of the form x * sin(t - w) or y * cos(t - z). There can be multiple sins and cos'.
from sympy import *
t = symbols('t', real=True)
expr = sin(t)+2*cos(t) # unknown
d = collect(expr.expand(trig=True), [sin(t), cos(t)], evaluate=False)
a = d[sin(t)]
b = d[cos(t)]
cos_phase = atan(a/b)
amplitude = a / sin(cos_phase)
print(amplitude.evalf() * cos(t - cos_phase.evalf()))
Which gives
2.23606797749979*cos(t - 0.463647609000806)
This seems to be a satisfactory match after plotting both graphs.
You could even have something like
expr = 2*sin(t - 3) + cos(t) - 3*cos(t - 2)
and it should work fine.

a * sin(wt) + b * cos(wt) = sqrt(a**2 + b**2) * sin(wt + acos(a / sqrt(a**2 + b**2)))
While the amplitude is the radical sqrt(a**2 + b**2), the phase is given by the arccosine of the ratio a / sqrt(a**2 + b**2), which may not be expressible in terms of arithmetic operations and radicals. Hence, you may be asking SymPy to do the impossible. Better use floating-point values, but you do not need SymPy for that.

SymPy unable to solve a system of trigonometric equations

I'm trying to get SymPy to solve a system of equations but it gives me an error saying:
NotImplementedError: could not solve 3*sin(3*t0/2)*tan(t0) + 2*cos(3*t0/2) - 4
Is there another way for me to be able to solve the system of equations:
sin(x)+(y-x)cos(x) = 0
-1.5(y-x)sin(1.5x)+cos(1.5x) = 2
I used :
from sympy import *
solve([sin(x)+(y-x)cos(x), -1.5(y-x)sin(1.5x)+cos(1.5x)-2], x, y)

SymPy could do better with this equation, but ultimately it's equivalent to some 10th degree polynomial the roots of which can only be represented abstractly. I'll describe the steps one can take and show how far SymPy can go. It's a semi-manual solution process which should be more automatic.
First of all, don't put 1.5, or other floating point numbers, in the equations. Instead, introduce a coefficient a = Rational(3, 2) and use that:
eq = [sin(x) + (y-x)*cos(x), -a*(y-x)*sin(a*x) + cos(a*x) - 2]
Variable y can be eliminated using the first equation: y=x-tan(x), which is easy for us to see, but SymPy sometimes misses the opportunity. Let's help it:
eq1 = eq[1].subs(y, x-tan(x)) # 3*sin(3*x/2)*tan(x)/2 + cos(3*x/2) - 2
As is, solve and solveset (an alternative SymPy solver) give up on the equation because of this mix of trigonometric functions of different arguments. Some of us remember from school days that trigonometric functions can be expressed as rational functions of the tangent of half-argument, so let's do that: rewrite the equation in terms of tan.
eq2 = eq1.rewrite(tan) # (-tan(3*x/4)**2 + 1)/(tan(3*x/4)**2 + 1) - 2 + 3*tan(3*x/4)*tan(x)/(tan(3*x/4)**2 + 1)
As mentioned, this halves the argument. Having fractions like x/4 in trig functions is bad. Introduce a new symbol, var('u'), and make u = x/4:
eq3 = eq2.subs(x, 4*u) # (-tan(3*u)**2 + 1)/(tan(3*u)**2 + 1) - 2 + 3*tan(3*u)*tan(4*u)/(tan(3*u)**2 + 1)
Now we can expand all these tangents in terms of tan(u), using expand_trig. The equation gets longer:
eq4 = expand_trig(eq3) # (1 - (-tan(u)**3 + 3*tan(u))**2/(-3*tan(u)**2 + 1)**2)/(1 + (-tan(u)**3 + 3*tan(u))**2/(-3*tan(u)**2 + 1)**2) - 2 + 3*(-4*tan(u)**3 + 4*tan(u))*(-tan(u)**3 + 3*tan(u))/((1 + (-tan(u)**3 + 3*tan(u))**2/(-3*tan(u)**2 + 1)**2)*(-3*tan(u)**2 + 1)*(tan(u)**4 - 6*tan(u)**2 + 1))
But it's also simpler because tan(u) can be treated as another unknown, say v.
eq5 = eq4.subs(tan(u), v) # (1 - (-v**3 + 3*v)**2/(-3*v**2 + 1)**2)/(1 + (-v**3 + 3*v)**2/(-3*v**2 + 1)**2) - 2 + 3*(-4*v**3 + 4*v)*(-v**3 + 3*v)/((1 + (-v**3 + 3*v)**2/(-3*v**2 + 1)**2)*(-3*v**2 + 1)*(v**4 - 6*v**2 + 1))
Great, now we have a rational function. It can be handled with solveset(eq5, x). By default solveset gives all complex solutions and we need only real roots among them, so let's specify the domain as Reals:
vsol = list(solveset(eq5, v, domain=S.Reals))
There is no algebraic formula for these, so they are recorded somewhat abstractly but these are actual numbers we can work with:
[CRootOf(3*v**10 + 9*v**8 - 78*v**6 + 22*v**4 - 21*v**2 + 1, 0),
CRootOf(3*v**10 + 9*v**8 - 78*v**6 + 22*v**4 - 21*v**2 + 1, 1),
CRootOf(3*v**10 + 9*v**8 - 78*v**6 + 22*v**4 - 21*v**2 + 1, 2),
CRootOf(3*v**10 + 9*v**8 - 78*v**6 + 22*v**4 - 21*v**2 + 1, 3)]
For example, we can go back to x and y now, and evaluate the solutions:
xsol = [4*atan(v) for v in vsol]
ysol = [x - tan(x) for x in xsol]
numsol = [(N(x), N(y)) for x, y in zip(xsol, ysol)]
Numeric values are
[(-4.35962510714700, -1.64344290066272),
(-0.877886785847899, 0.326585146723377),
(0.877886785847899, -0.326585146723377),
(4.35962510714700, 1.64344290066272)]
Of course there are infinitely more because the tangent is periodic. Finally, let's check these actually work:
residuals = [[e.subs({x: xv, y: yv}) for e in eq] for xv, yv in numsol]
These are a bunch of numbers of order 1e-15 or less, so yes, the equations hold within machine precision.
Unlike a purely numeric solution we'd get from SciPy or other numeric solvers, these can be evaluated with any accuracy without repeating the process. For example, 50 digits of the first x-solution:
xsol[0].evalf(50) # -4.3596251071470021258397061103704574594477338857831

Just for the fun of it here is a manual solution that only needs solving a polynomial of degree 5:
Write t = x/2, a = y-x, s = sin t, c = cos t, S = sin x and
C = cos x.
The the given equations can be rewritten
(1) 2 sc + a (c^2 - s^2) = 0
(2) 3 a s^3 - 9 a c^2 s - 6 c s^2 + 2 c^3 = 4
Multiplying (1) by 3 s and adding to (2):
(3) -6 a c^2 s + 2 c^3 = 4
Next we substitute a = -S / C and use S = 2sc and s^2 = 1 - c^2:
(4) 12 c^3 (1 - c^2) / C + 2 c^3 = 4
Multiply with C = 2 c^2 - 1:
(5) c^3 (12 - 12 c^2 + 4 c^2 - 2) = 8 c^2 - 4
Finally,
(6) 4 c^5 - 5 c^3 + 4 c^2 - 2 = 0
This has a pair of complex solutions, one real solution outside the domain of the cosine and another two solutions which give the four principal solutions for x.
(7) c_1/2 = 0.90520121, -0.57206084
(8) x_1/2/3/4 = +/- 2 arccos(x_1/2)

ln (Natural Log) in Python

In this assignment I have completed all the problems except this one. I have to create a python script to solve an equation (screenshot).
Unfortunately, in my research all over the internet I cannot figure out how in the world to either convert ln to log or anything usable, or anything. The code I have written so far is below. I will also post the answer that our teacher says we should get.
import math
p = 100
r = 0.06 / 12
FV = 4000
n = str(ln * ((1 + (FV * r) / p) / (ln * (1 + r))))
print ("Number of periods = " + str(n))
The answer I should get is 36.55539635919235
Any advice or help you have would be greatly appreciated!
Also, we are not using numpy. I already attempted that one.
Thanks!

math.log is the natural logarithm:
From the documentation:
math.log(x[, base]) With one argument, return the natural logarithm of
x (to base e).
Your equation is therefore:
n = math.log((1 + (FV * r) / p) / math.log(1 + r)))
Note that in your code you convert n to a str twice which is unnecessary

Here is the correct implementation using numpy (np.log() is the natural logarithm)
import numpy as np
p = 100
r = 0.06 / 12
FV = 4000
n = np.log(1 + FV * r/ p) / np.log(1 + r)
print ("Number of periods = " + str(n))
Output:
Number of periods = 36.55539635919235

Computing factorials efficiently with Python and Numpy

In python / numpy - is there a way to build an expression containing factorials - but since in my scenario, many factorials will be duplicated or reduced, wait until I instruct the run time to compute it.
Let's say F(x) := x!
And I build an expression like (F(6) + F(7)) / F(4) - I can greatly accelerate this, even do it in my head by doing
(F(6) * (1 + 7)) / F(4)
= 5 * 6 * 8
= 240
Basically, I'm going to generate such expressions and would like the computer to be smart, not compute all factorials by multiplying to 1, i.e using my example not actually do
(6*5*4*3*2 + 7*6*5*4*3*2) / 4*3*2
I've actually started developing a Factorial class, but I'm new to python and numpy and was wondering if this is a problem that's already solved.

As #Oleg has suggested, you can do this with sympy:
import numpy as np
import sympy as sp
# preparation
n = sp.symbols("n")
F = sp.factorial
# create the equation
f = (F(n) + F(n + 1)) / F(n - 2)
print(f) # => (factorial(n) + factorial(n + 1))/factorial(n - 2)
# reduce it
f = f.simplify()
print(f) # => n*(n - 1)*(n + 2)
# evaluate it in SymPy
# Note: very slow!
print(f.subs(n, 6)) # => 240
# turn it into a numpy function
# Note: much faster!
f = sp.lambdify(n, f, "numpy")
a = np.arange(2, 10)
print(f(a)) # => [ 8 30 72 140 240 378 560 792]

Maybe you could look into increasing the efficiency using table lookups if space efficiency isn't a major concern. It would greatly reduce the number of repeated calculations. The following isn't terribly efficient, but it's the basic idea.
cache = {1:1}
def cached_factorial(n):
if (n in cache):
return cache[n]
else:
result = n * cached_factorial(n-1)
cache[n] = result
return result

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

math domain error while using PCA - python

Related

More precision in numpy arrays

How do I simplify the sum of sine and cosine in SymPy?

SymPy unable to solve a system of trigonometric equations

ln (Natural Log) in Python

Computing factorials efficiently with Python and Numpy

Categories

Resources