Performing Householder Reflection of a vector for QR Decomposition - python

This question was asked before me on here.
However, the solution there was not satisfactory for me, I am still stuck at 33% mismatch, so I felt the need to re-open this topic (And also the author of that thread didn't add an appropriate answer after solving the issue for themselves).
The code that I have written is here:
def householder(vec):
vec = np.asarray(vec, dtype=float)
if vec.ndim != 1:
raise ValueError("vec.ndim = %s, expected 1" % vec.ndim)
n = len(vec)
I = np.eye(n)
e1 = np.zeros_like(vec).astype(float)
e1[0] = 1.0
V1 = e1 * np.linalg.norm(vec)
print("V1:", V1)
u = vec
u[0] = -(np.sum(np.square(u[1:]))) / (vec[0] + np.linalg.norm(vec))
u = u / np.linalg.norm(u)
H = I - 2 * (np.outer(u, u))
return V1 , H
Here is the test case that this code is supposed to pass:
v = np.array([1, 2, 3])
v1, h = householder(v)
assert_allclose(np.dot(h, v1), v)
assert_allclose(np.dot(h, v), v1)
The first assertion is passed successfully, however, the second one gives me a 33% mismatch:
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatch: 33.3%
Max absolute difference: 4.4408921e-16
Max relative difference: 1.18687834e-16
x: array([3.741657e+00, 2.220446e-16, 0.000000e+00])
y: array([3.741657, 0. , 0. ])
I have been trying everything for like 5 hours now, and I feel like I'm wasting too much time on this. Any help to make this code pass the test would be much appreciated by me.

Well, it looks correct to me.
The problem seem to be the parameters of the assert_allclose function. Specifically, it reports whether or not
absolute(a - b) <= (atol + rtol * absolute(b))
for each pair of entries a and b. According to the docs, the absolute tolerance is 1e-8 for the ordinary allclose function. However, the assert_allclose parameter of atol is 0 by default.
Since your target b is zero, any value != 0 is not close with respect to this function, even though the two values are certainly reasonably close.
I recommend setting atol to 1e-8, i.e.
assert_allclose(np.dot(h, v), v1, atol=1e-8)
I am not quite sure why the numpy people chose different parameters for the ordinary allclose and assert_allclose though...

Related

Anyway to get rid of `math.floor` for positive odd integers with `sympy.simplify`?

I'm trying to simplify some expressions of positive odd integers with sympy. But sympy refuses to expand floor, making the simplification hard to proceed.
To be specific, x is a positive odd integer (actually in my particular use case, the constraint is even stricter. But sympy can only do odd and positive, which is fine). x // 2 should be always equal to (x - 1) / 2. Example code here:
from sympy import Symbol, simplify
x = Symbol('x', odd=True, positive=True)
expr = x // 2 - (x - 1) / 2
print(simplify(expr))
prints -x/2 + floor(x/2) + 1/2. Ideally it should print 0.
What I've tried so far:
Simplify (x - 1) // 2 - (x - 1) / 2. Turns out to be 0.
Multiply the whole thing by 2: 2 * (x // 2 - (x - 1) / 2). Gives me: -x + 2*floor(x/2) + 1.
Try to put more weights on the FLOOR op by customizing the measure. No luck.
Use sympy.core.evaluate(False) context when creating the expression. Nuh.
Tune other parameters like ratio, rational, and play with other function like expand, factor, collect. Doesn't work either.
EDIT: Wolfram alpha can do this.
I tried to look like the assumptions of x along with some expressions. It surprises me that (x - 1) / 2).is_integer returns None, which means unknown.
I'm running out of clues. I'm even looking for alternativese of sympy. Any ideas guys?
I fail to see why sympy can't simplify that.
But, on another hand, I've discovered the existence of odd parameter just now, with your question.
What I would have done, without the knowledge of odd is
k = Symbol('k', positive=True, integer=True)
x = 2*k-1
expr = x // 2 - (x - 1) / 2
Then, expr is 0, without even the need to simplify.
So, can't say why you way doesn't work (and why that odd parameter exists if it is not used correctly to guess that x-1 is even, and therefore (x-1)/2 integer). But, in the meantime, my way of defining an odd integer x works.
There is some reluctance to make too much automatic in SymPy, but this seems like a case that could be addressed (since (x-1)/2 is simpler than floor(x/2). Until then, however, you can run a replacement on your expression which makes this transformation for you.
Let's define a preferred version of floor:
def _floor(x):
n, d = x.as_numer_denom()
if d == 2:
if n.is_odd:
return (n - 1)/2
if n.is_even:
return n/2
return floor(x)
When you have an expression with floor that you want to evaluate, replace floor with _floor:
>>> x = Symbol('x', odd=True)
>>> eq=x // 2 - (x - 1) / 2
>>> eq.replace(floor, _floor)
0

How to avoid negative solutions from sympy.solve?

I am having the very same problem asked in this question, but I can't figure out why the solution is not working.
In that question, there was an issue in sqrt function that seems to be solved, and now that problem leads to only positive results.
But in my problem, I can't eliminate the negative solution in the following code:
import sympy
v,Vs,Vp = sympy.symbols('v,Vs,Vp',real=True,positive=True)
sympy.solve( v - (Vp**2-2*Vs**2)/(2*(Vp**2-Vs**2)), Vs)
Which gives me the result
[-sqrt(2)*Vp*sqrt((2*v - 1)/(v - 1))/2, sqrt(2)*Vp*sqrt((2*v - 1)/(v - 1))/2]
How can I get only the positive result? What am I missing?
As the comments in the thread already describe, it is not really possible to get what you want in general.
There is a trick to assume 0 < v < 1/2. Since this involves a few fractions, intuition says that we should probably make a substitution that involves a fraction too.
import sympy
Vs,Vp = sympy.symbols('Vs,Vp', positive=True)
# A hack to assume 0 < v < 1/2
u = sympy.symbols('u', positive=True)
v = 1/(u+2) # Alternatives like atan can be used when there are trig functions
sol = sympy.solve( v - (Vp**2-2*Vs**2)/(2*(Vp**2-Vs**2)), Vs)
print(sol)
# Substitute back by redefining v
v = sympy.symbols('v', positive=True)
new_sol = [subsol.subs(u, 1/v - 2).simplify() for subsol in sol]
print(new_sol)
The next best you can do in this case is assume all square roots are positive which is a very brave assumption.
import sympy
v,Vs,Vp = sympy.symbols('v,Vs,Vp', real=True, positive=True)
sol = sympy.solve( v - (Vp**2-2*Vs**2)/(2*(Vp**2-Vs**2)), Vs)
# Assume sqrts are positive and sol is an array
# Both of these are not true in general
# It does not work if we assume the square root can be zero
# Or even complex or negative
s = sympy.symbols('s', positive=True) # Represents any square root
w = sympy.Wild('w') # Represents any argument inside a square root
new_sol = [subsol for subsol in sol if subsol.replace(sympy.sqrt(w), s) > 0]
print(new_sol)
Both code blocks assume sol is an array which is not true in general when it comes to solve.

Is there a way to find a symbolical function’s vertical bounds?

Given a SymPy function f(x) and values a, b (a != b), is there a way to find the minimum and maximum value of f(x) on this interval? I’ve found some code for finding extremums that can be adapted for this purpose (split them into a min and max array, find lowest and highest respectively values with lambdify and use them), but surely there must be an easier way?
An alternative option would be using a np.linspace, but then I might miss out on exact values, which would be bad for things I have to do with them next.
As now noted in the cited page, since this PR you should be able to do the following:
from sympy.calculus.util import *
f = (x**3 / 3) - (2 * x**2) - 3 * x + 1
ivl = Interval(0, 3) # e.g. your (a, b)
print(minimum(f, ivl))
print(maximum(f, ivl))

How to perform two-sample one-tailed t-test with numpy/scipy

In R, it is possible to perform two-sample one-tailed t-test simply by using
> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")
Welch Two Sample t-test
data: A and B
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-1.029916 Inf
sample estimates:
mean of x mean of y
0.9954942 1.1798523
In Python world, scipy provides similar function ttest_ind, but which can only do two-tailed t-tests. Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy.
Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy?
From your mailing list link:
because the one-sided tests can be backed out from the two-sided
tests. (With symmetric distributions one-sided p-value is just half
of the two-sided pvalue)
It goes on to say that scipy always gives the test statistic as signed. This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0, and of a less-than test when p/2 < alpha and t < 0.
After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.
First let's formulate our investigative question properly. The data we are investigating is
A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])
with the sample means
A.mean() = 0.99549419
B.mean() = 1.1798523
I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.
So we have the Null Hypothesis
H0: A >= B
that we would like to reject in favor of the Alternative Hypothesis
H1: B > A
Now when you call scipy.stats.ttest_ind(x, y), this makes a Hypothesis Test on the value of x.mean()-y.mean(), which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call
stats.ttest_ind(B,A)
instead of stats.ttest_ind(B,A). We get as an answer
t-value = 0.42210654140239207
p-value = 0.68406235191764142
and since according to the documentation this is the output for a two-tailed t-test we must divide the p by 2 for our one-tailed test. So depending on the Significance Level alpha you have chosen you need
p/2 < alpha
in order to reject the Null Hypothesis H0. For alpha=0.05 this is clearly not the case so you cannot reject H0.
An alternative way to decide if you reject H0 without having to do any algebra on t or p is by looking at the t-value and comparing it with the critical t-value t_crit at the desired level of confidence (e.g. 95%) for the number of degrees of freedom df that applies to your problem. Since we have
df = sample_size_1 + sample_size_2 - 2 = 8
we get from a statistical table like this one that
t_crit(df=8, confidence_level=95%) = 1.860
We clearly have
t < t_crit
so we obtain again the same result, namely that we cannot reject H0.
from scipy.stats import ttest_ind
def t_test(x,y,alternative='both-sided'):
_, double_p = ttest_ind(x,y,equal_var = False)
if alternative == 'both-sided':
pval = double_p
elif alternative == 'greater':
if np.mean(x) > np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
elif alternative == 'less':
if np.mean(x) < np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
return pval
A = [0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846]
B = [0.6383447, 0.5271385, 1.7721380, 1.7817880]
print(t_test(A,B,alternative='greater'))
0.6555098817758839
When null hypothesis is Ho: P1>=P2 and alternative hypothesis is Ha: P1<P2. In order to test it in Python, you write ttest_ind(P2,P1). (Notice the position is P2 first).
first = np.random.normal(3,2,400)
second = np.random.normal(6,2,400)
stats.ttest_ind(first, second, axis=0, equal_var=True)
You will get the result like below
Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)
In Python, when statstic <0 your real p-value is actually real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2, which is approximately 0.99. As your p-value is larger than 0.05, you cannot reject the null hypothesis that 6>=3. when statstic >0, the real z score is actually equal to -statstic, the real p-value is equal to pvalue/2.
Ivc's answer should be when (1-p/2) < alpha and t < 0, you can reject the less than hypothesis.
Based on this function from R: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test
def ttest(a, b, axis=0, equal_var=True, nan_policy='propagate',
alternative='two.sided'):
tval, pval = ttest_ind(a=a, b=b, axis=axis, equal_var=equal_var,
nan_policy=nan_policy)
if alternative == 'greater':
if tval < 0:
pval = 1 - pval / 2
else:
pval = pval / 2
elif alternative == 'less':
if tval < 0:
pval /= 2
else:
pval = 1 - pval / 2
else:
assert alternative == 'two.sided'
return tval, pval
Did you look at this:
How to calculate the statistics "t-test" with numpy
I think that is exactly what this questions is looking at.
Basically:
import scipy.stats
x = [1,2,3,4]
scipy.stats.ttest_1samp(x, 0)
Ttest_1sampResult(statistic=3.872983346207417, pvalue=0.030466291662170977)
is the same result as this example in R. https://stats.stackexchange.com/questions/51242/statistical-difference-from-zero

Simultaneous Equations with given conditions

to start off I have already solved this problem so it's not a big deal, I'm just asking to satisfy my own curiosity. The question is how to solve a series of simultaneous equations given a set of constraints. The equations are:
tau = 62.4*d*0.0007
A = (b + 1.5*d)*d
P = b + 2*d*sqrt(1 + 1.5**2)
R = A/P
Q = (1.486/0.03)*A*(R**(2.0/3.0))*(0.0007**0.5)
and the conditions are:
tau <= 0.29, Q = 10000 +- say 3, and minimize b
As I mentioned I was already able to come up with a solution using a series of nested loops:
b = linspace(320, 330, 1000)
d = linspace(0.1, 6.6392, 1000)
ansQ = []
ansv = []
anstau = []
i_index = []
j_index = []
for i in range(len(b)):
for j in range(len(d)):
tau = 62.4*d[j]*0.0007
A = (b[i] + 1.5*d[j])*d[j]
P = b[i] + 2*d[j]*sqrt(1 + 1.5**2)
R = A/P
Q = (1.486/0.03)*A*(R**(2.0/3.0))*(0.0007**0.5)
if Q >= 10000 and tau <= 0.29:
ansQ.append(Q)
ansv.append(Q/A)
anstau.append(tau)
i_index.append(i)
j_index.append(j)
This takes a while, and there is something in the back of my head saying that there must be an easier/more elegant solution to this problem. Thanks (Linux Mint 13, Python 2.7.x, scipy 0.11.0)
You seem to only have two degrees of freedom here---you can rewrite everything in terms of b and d or b and tau or (pick your two favorites). Your constraint on tau implies directly a constraint on d, and you can use your constraint on Q to imply a constraint on b.
And it doesn't look (to me at least, I still haven't finished my coffee) that your code is doing anything other than plotting some two dimensional functions over a grid you've defined--NOT solving a system of equations. I normally understand "solving" to involve setting something equal to something else, and writing one variable as a function of another variable.
It does appear you've only posted a snippet, though, so I'll assume you do something else with your data down stream.
Ok, I see. I think this isn't really a minimization problem, it's a plotting problem. The first thing I'd do is see what ranges are implied for b and d from your constraints on tau, and then use that to derive a constraint on d. Then you can mesh those points with meshgrid (as you mentioned below) and run over all combinations.
Since you're applying the constraint before you apply the mesh (as opposed to after, as in your code), you'll only be sampling the parameter space that you're interested in. In your code you generate a bunch of junk you're not interested in, and pick out the gems. If you apply your constraints first, you'll only be left with gems!
I'd define my functions like:
P = lambda b, d: b + 2*d*np.sqrt(1 + 1.5**2)
which works like
>>> import numpy as np
>>> P = lambda b, d: b + 2*d*np.sqrt(1 + 1.5**2)
>>> P(1,2)
8.2111025509279791
Then you can write another function to serve up b and d for you, so you can do something like:
def get_func_vals(b, d):
pvals.append(P(b,d))
or, better yet, store b and d as tuples in a function that doesn't return but yields:
pvals = [P(b,d) for (b,d) in thing_that_yields_b_and_d_tuples]
I didn't test this last line of code, and I always screw up these parenthesis, but I think it's right.

Categories

Resources