This question is about precision of computation using NumPy vs. Octave/MATLAB (the MATLAB code below has only been tested with Octave, however). I am aware of a similar question on Stackoverflow, namely this, but that seems somewhat far from what I'm asking below.
Setup
Everything is running on Ubuntu 14.04.
Python version 3.4.0.
NumPy version 1.8.1 compiled against OpenBLAS.
Octave version 3.8.1 compiled against OpenBLAS.
Sample Code
Sample Python code.
import numpy as np
from scipy import linalg as la
def build_laplacian(n):
lap=np.zeros([n,n])
for j in range(n-1):
lap[j+1][j]=1
lap[j][j+1]=1
lap[n-1][n-2]=1
lap[n-2][n-1]=1
return lap
def evolve(s, lap):
wave=la.expm(-1j*s*lap).dot([1]+[0]*(lap.shape[0]-1))
for i in range(len(wave)):
wave[i]=np.linalg.norm(wave[i])**2
return wave
We now run the following.
np.min(evolve(2, build_laplacian(500)))
which gives something on the order of e-34.
We can produce similar code in Octave/MATLAB:
function lap=build_laplacian(n)
lap=zeros(n,n);
for i=1:(n-1)
lap(i+1,i)=1;
lap(i,i+1)=1;
end
lap(n,n-1)=1;
lap(n-1,n)=1;
end
function result=evolve(s, lap)
d=zeros(length(lap(:,1)),1); d(1)=1;
result=expm(-1i*s*lap)*d;
for i=1:length(result)
result(i)=norm(result(i))^2;
end
end
We then run
min(evolve(2, build_laplacian(500)))
and get 0. In fact, evolve(2, build_laplacian(500)))(60) gives something around e-100 or less (as expected).
The Question
Does anyone know what would be responsible for such a large discrepancy between NumPy and Octave (again, I haven't tested the code with MATLAB, but I'd expect to see similar results).
Of course, one can also compute the matrix exponential by first diagonalizing the matrix. I have done this and have gotten similar or worse results (with NumPy).
EDITS
My scipy version is 0.14.0. I am aware that Octave/MATLAB use the Pade approximation scheme, and am familiar with this algorithm. I am not sure what scipy does, but we can try the following.
Diagonalize the matrix with numpy's eig or eigh (in our case the latter works fine since the matrix is Hermitian). As a result we get two matrices: a diagonal matrix D, and the matrix U, with D consisting of eigenvalues of the original matrix on the diagonal, and U consists of the corresponding eigenvectors as columns; so that the original matrix is given by U.T.dot(D).dot(U).
Exponentiate D (this is now easy since D is diagonal).
Now, if M is the original matrix and d is the original vector d=[1]+[0]*n, we get scipy.linalg.expm(-1j*s*M).dot(d)=U.T.dot(numpy.exp(-1j*s*D).dot(U.dot(d)).
Unfortunately, this produces the same result as before. Thus this probably has something to do either with the way numpy.linalg.eig and numpy.linalg.eigh work, or with the way numpy does arithmetic internally.
So the question is: how do we increase numpy's precision? Indeed, as mentioned above, Octave seems to do a much finer job in this case.
The following code
import numpy as np
from scipy import linalg as la
import scipy
print np.__version__
print scipy.__version__
def build_laplacian(n):
lap=np.zeros([n,n])
for j in range(n-1):
lap[j+1][j]=1
lap[j][j+1]=1
lap[n-1][n-2]=1
lap[n-2][n-1]=1
return lap
def evolve(s, lap):
wave=la.expm(-1j*s*lap).dot([1]+[0]*(lap.shape[0]-1))
for i in range(len(wave)):
wave[i]=la.norm(wave[i])**2
return wave
r = evolve(2, build_laplacian(500))
print np.min(abs(r))
print r[59]
prints
1.8.1
0.14.0
0
(2.77560227344e-101+0j)
for me, with OpenBLAS 0.2.8-6ubuntu1.
So it appears your problem is not immediately reproduced. Your code examples above are not runnable as-is (typos).
As mentioned in scipy.linalg.expm documentation, the algorithm is from Al-Mohy and Higham (2009), which is different from the simpler scale-and-square-Pade in Octave.
As a consequence, the results also I get from Octave are slightly different, although the results are eps-close in matrix norms (1,2,inf). MATLAB uses the Pade approach from Higham (2005), which seems to give the same results as Scipy above.
Related
I'm getting inverse of matrix even if determinant is zero. I tried with the code below:
import numpy as np
from scipy import linalg
matrix=np.array([[5,10],[2,4]])
print(linalg.det(matrix))
linalg.inv(matrix)
The problem here is a disconnect between how different routines calculate the matrix determinant. For example:
import numpy as np
matrix=np.array([[5.,10.],[2.,4.]])
print(np.linalg.det(matrix))
print(np.linalg.slogdet(matrix))
try:
invmatrix=np.linalg.inv(matrix)
except np.linalg.LinAlgError:
print("inversion failed")
produces no exception and prints this:
-1.1102230246251625e-15
(-1.0, -34.43421547668305)
i.e. not using a direct algebraic calculation of the determinant (which scipy.linalg.det does) yields a non-zero determinant, because of accumulated floating point rounding error. Thus the standard linear algebra routines treat the matrix as non-singular and produce an incorrect inverse from an extremely poor conditioned problem.
(Tested with numpy version 1.15.4 and scipy version 1.1.0)
I am trying to use code which uses Bessel function zeros for other calculations. I noticed the following piece of code produces results that I consider unexpected.
import scipy
from scipy import special
scipy.special.jn_zeros(1,2)
I would expect the result from this call to be
array([0., 3.83170597])
instead of
array([3.83170597, 7.01558667])
Is there a reason a reason why the root at x=0.0 is not being returned?
From what I can see the roots are symmetric along the x-axis except for any found at the origin, but I do not think this would be enough of a reason to leave off the root completely.
The computer I am using has python version 2.7.10 installed and is using scipy version 0.19.0
P.S. the following function is what I am trying to find the zeros of
scipy.special.j1
It appears to be convention to not count the zero at zero, see for example ħere. Maybe it is considered redundant?
I'm getting ValueError: Linkage 'Z' uses the same cluster more than once. when trying to get flat clusters in Python with scipy.cluster.hierarchy.fcluster. This error happens only sometimes, usually only with really big matrices ie 10000x10000.
import scipy.cluster.hierarchy as sch
Z = sch.linkage(d, method="ward")
# some computation here, returning n (usually between 5-30)
clusters = sch.fcluster(Z, t=n, criterion='maxclust')
Why does it happen? How can I avoid it? Unfortunately I couldn't find any useful info by googling...
EDIT Error occurs also when trying to get dendrogram.
No such error appear if method='average' is used.
It seems using fastcluster instead of scipy.cluster.hierarchy solves the problem. In addition, fastcluster implementation is slightly faster than scipy.
For more details have a look at the paper.
import fastcluster
Z = fastcluster.linkage(d, method="ward")
# some computation here, returning n (usually between 5-30)
clusters = fastcluster.fcluster(Z, t=n, criterion='maxclust')
Python, NumPy and R all use the same algorithm (Mersenne Twister) for generating random number sequences. Thus, theoretically speaking, setting the same seed should result in same random number sequences in all 3. This is not the case. I think the 3 implementations use different parameters causing this behavior.
R
>set.seed(1)
>runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
Python
In [3]: random.seed(1)
In [4]: [random.random() for x in range(5)]
Out[4]:
[0.13436424411240122,
0.8474337369372327,
0.763774618976614,
0.2550690257394217,
0.49543508709194095]
NumPy
In [23]: import numpy as np
In [24]: np.random.seed(1)
In [25]: np.random.rand(5)
Out[25]:
array([ 4.17022005e-01, 7.20324493e-01, 1.14374817e-04,
3.02332573e-01, 1.46755891e-01])
Is there some way, where NumPy and Python implementation could produce the same random number sequence? Ofcourse as some comments and answers point out, one could use rpy. What I am specifically looking for is to fine tune the parameters in the respective calls in Python and NumPy to get the sequence.
Context: The concern comes from an EDX course offering in which R is used. In one of the forums, it was asked if Python could be used and the staff replied that some assignments would require setting specific seeds and submitting answers.
Related:
Comparing Matlab and Numpy code that uses random number generation From this it seems that the underlying NumPy and Matlab implementation are similar.
python vs octave random generator: This question does come fairly close to the intended answer. Some sort of wrapper around the default state generator is required.
use rpy2 to call r in python, here is a demo, the numpy array data is sharing memory with x in R:
import rpy2.robjects as robjects
data = robjects.r("""
set.seed(1)
x <- runif(5)
""")
print np.array(data)
data[1] = 1.0
print robjects.r["x"]
I realize this is an old question, but I've stumbled upon the same problem recently, and created a solution which can be useful to others.
I've written a random number generator in C, and linked it to both R and Python. This way, the random numbers are guaranteed to be the same in both languages since they are generated using the same C code.
The program is called SyncRNG and can be found here: https://github.com/GjjvdBurg/SyncRNG.
I'm a Python newbie coming from using MATLAB extensively. I was converting some code that uses log2 in MATLAB and I used the NumPy log2 function and got a different result than I was expecting for such a small number. I was surprised since the precision of the numbers should be the same (i.e. MATLAB double vs NumPy float64).
MATLAB Code
a = log2(64);
--> a=6
Base Python Code
import math
a = math.log2(64)
--> a = 6.0
NumPy Code
import numpy as np
a = np.log2(64)
--> a = 5.9999999999999991
Modified NumPy Code
import numpy as np
a = np.log(64) / np.log(2)
--> a = 6.0
So the native NumPy log2 function gives a result that causes the code to fail a test since it is checking that a number is a power of 2. The expected result is exactly 6, which both the native Python log2 function and the modified NumPy code give using the properties of the logarithm. Am I doing something wrong with the NumPy log2 function? I changed the code to use the native Python log2 for now, but I just wanted to know the answer.
No. There is nothing wrong with the code, it is just because floating points cannot be represented perfectly on our computers. Always use an epsilon value to allow a range of error while checking float values. Read The Floating Point Guide and this post to know more.
EDIT - As cgohlke has pointed out in the comments,
Depending on the compiler used to build numpy np.log2(x) is either computed by the C library or as 1.442695040888963407359924681001892137*np.log(x) See this link.
This may be a reason for the erroneous output.