numpy and pysparse floating point representation problem

numpy and pysparse floating point representation problem - python

I have started using numpy along with pysparse package which interfaces UMFPACK, however there is a problem with the floating point results with numpy. By the way, this is a lanczos eigenvalue solver for structural problems.
When I do the same operations in MATLAB I get different results, well the results are on the order of 1e-6,1e-8 and with MATLAB's representation, I get the right eigenvalues. NumPy and PySparse results are also not that far, at least on the order level, however using them to create a triadiagonal matrix on which to find the eigenvalues is the source of the problem. I could not understand what is going wrong, well the issue is the floating point representation, but how to fix this if possible? I tried to use 'Float64' as my datatype but that does not make a change on the results of the problem. Such as
q = ones(n, dtype = 'Float64')
One more, what is the most mature sparse package for python, and what kind of interfaces are provided, if any? As told, PySparse seemed fine to me at first sight...

float64 is the default data type in Numpy. You could try using float128 for more precision, but be warned that certain functions (and basically everything on Windows) will coerce it to float64 anyway.
I would recommend using scipy.sparse for your sparse eigenvector problems. I have tried both PySparse and scipy.sparse, and I would conclude that although PySparse is easier to use, scipy.sparse is more mature.
Here's the sparse linear algebra documentation: http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html

Related

Large sparse matrix inversion on Python

I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
I chose Python (which is not the fastest) and it works pretty well. However, in my code, I have inversions of large sparse symmetric (non-positive definite, so can't use Cholesky) matrix to execute (image below). I currenty use np.linalg.inv() which is using the LU decomposition method.
I pretty sure there would be some optimization to do in terms of rapidity.
I thought about Cuthill-McKee algotihm to rearange the matrix and take its inverse. Do you have any ideas or advice ?
Thank you very much for your answers !

Good news is that if you're using any of the popular python libraries for linear algebra (namely, numpy), the speed of python really doesn't matter for the math – it's all done natively inside the library.
For example, when you write matrix_prod = matrix_a # matrix_b, that's not triggering a bunch of Python loops to multiply the two matrices, but using numpy's internal implementation (which I think uses the FORTRAN LAPACK library).
The scipy.sparse.linalg module has your back covered when it comes to solving sparsely stored matrices specifying sparse systems of equations. (which is what you do with the inverse of a matrix). If you want to use sparse matrices, that's your way to go – notice that there's matrices that are sparse in mathematical terms (i.e., most entries are 0), and matrices which are stored as sparse matrix, which means you avoid storing millions of zeros. Numpy itself doesn't have sparsely stored matrices, but scipy does.
If your matrix is densely stored, but mathematically sparse, i.e. you're using standard numpy ndarrays to store it, then you won't get any more rapid by implementing anything in Python. The theoretical complexity gains will be outweighed by the practical slowness of Python compared to highly optimized inversion.
Inverting a sparse matrix usually loses the sparsity. Also, you never invert a matrix if you can avoid it at all! For a sparse matrix, solving the linear equation system Ax = b, with A your matrix and b a known vector, for x, is so much faster done forward than computing A⁻¹! So,
I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
since LS says you don't need the inverse matrix, simply don't calculate it, ever. The point of LS is finding a solution that's as close as it gets, even if your matrix isn't invertible. Which can very well be the case for sparse matrices!

Efficient way to compute the confluent Hypergeometric function for large arrays (~ 10^8 points) with complex parameters

I am working on a project related to gravitational lensing, for which I need to evaluate the confluent hypergeometric function 1F1(a,b,z) for an array z of length ~ 10^8 complex points, a = 1+0.48j and b = 1. I am looking for an efficient way to evaluate this on large array sizes. The scipy implementation is fast but does not accept complex arguments for a and b.
mpmath seems to be the best way to calculate 1F1 for complex parameters but mpmath.hyp1f1 does not accept array values. The best workaround I found for this was to use np.vectorize or np.frompyfunc to allow passing a NumPy array as a parameter. However, this is extremely slow and would take days to execute (even with gmpy2 installed). I assume this is because mpmath functions are always slow on large array sizes.
a nonpython implementation would be fine as well, as long as I can somehow save the result on disk and read it into my python code. I have seen some implementations (for example https://www.math.ucla.edu/~mason/research/pearson_final.pdf) which could possibly work but I'm not sure.
Another possible way would be to interpolate the function
(consecutive points in my input array are extremely close) but I'm not sure what would be the best way to do that.
Thanks!

I was having a very similar problem than you have.
I figured out that the mpmath package has a "hidden" set of function with (only) float precision, which one can access by writing fp. upfront. This does not exist for hyp1f1 but for the more general hyper. Meaning there is a fp.hyper in the mpmath package which is with fp.hyper([a],[b],z) equivalent to hyper1f1(a,b,z), but is a lot faster.
If you vectorize this with np.vectorize this should make your calculation substansially faster.
Disclaimer: I got an error message saying that some complex value is converted to real by dropping the imaginary part when evaluating this, but so far the results i have gotten seem sensible and compatible to the hyper1f1(a,b,z) values.
Added: It seems that fp.hyper does not like getting numpy datatypes even if they are scalars, as in the case of a,b,z beeing numpy scalars (for example one element of an numpy array) it will simply return 1 without giving an error message independent of the actual input. If you use np.vectorize however everything should be fine.
Eitherway: Use at own risc.

Eigenvectors of a large sparse matrix in tensorflow

I was wondering if there is a way to calculate the first few eigenvectors of a very large sparse matrix in tensorflow, hoping that it might be faster than scipy's implementation of ARPACK, which doesn't seem to support parallel computing. At least, as far as I noticed.

I believe you should rather look into PETCs4py or SLEPc4py.
They are python binding of PETSc (Portable, Extensible Toolkit for
Scientific Computation) and SLEPc (Scalable Library for Eigenvalue Problem Computations).
PETSc and SLEPc support MPI and therefore PETCs4py and SLEPc4py do too.
I believe you will find useful examples in examples

fft2 different result in numpy and matlab

I was trying to port one code from python to matlab, but I encounter one inconsistence between numpy fft2 and matlab fft2:
peak =
4.377491037053e-223 3.029446976068e-216 ...
1.271610790463e-209 3.237410810582e-203 ...
(Large data can't be list directly, it can be accessed here:https://drive.google.com/file/d/0Bz1-hopez9CGTFdzU0t3RDAyaHc/edit?usp=sharing)
Matlab:
fft2(peak) --(sample result)
12.5663706143590 -12.4458341615690
-12.4458341615690 12.3264538927637
Python:
np.fft.fft2(peak) --(sample result)
12.56637061 +0.00000000e+00j -12.44583416 +3.42948517e-15j
-12.44583416 +3.35525358e-15j 12.32645389 -6.78073635e-15j
Please help me to explain why, and give suggestion on how to fix it.

The Fourier transform of a real, even function is real and even (ref). Therefore, it appears that your FFT should be real? Numpy is probably just struggling with the numerics while MATLAB may outright check for symmetry and force the solution to be real.
MATLAB uses FFTW3 while my research indicates Numpy uses a library called FFTPack. FFTW is one of the standards for FFT performance and uses a number of tricks to work quickly and perform calculations to the best precision possible. You can incredibly tiny numbers and this offers a number of numerical challenges that any library will be hard pressed to resolve.
You might consider executing the Python code against an FFTW3 wrapper like pyFFTW3 and see if you get similar results.
It appears that your input data is gaussian real and even, in which case we do expect the FFT2 of the signal to be real and even. If all your inputs are this way you could just take the real part. Or round to a certain precision. I would trust MATLAB's FFTW code over the Python code.
Or you could just ignore it. The differences are quite small and a value of 3e-15i is effectively zero for most applications. If you have automated the comparison, consider calling them equivalent if the mean square error of all the entries is less than some threshold (say 1e-8 or 1e-15 or 1e-20).

How do I do matrix computations in python without rounding?

I have some integer matrices of moderate size (a few hundred rows). I need to solve equations of the form Ax = b where b is a standard basis vector and A is one of my matrices. I have been using numpy.linalg.lstsq for this purpose, but the rounding errors end up being too significant.
How can I carry out an exact symbolic computation?
(PS I don't really need the code to be efficient; I'm more concerned about ease of coding.)

If your only option is to use free tools written in python, sympy might work, but it could well be simpler to use mathematica.

Note that if you're serious about your comment that you require your solution vector to be integer, then you're looking for something called the "integer least squares problem". Which is believed to be NP-hard. There are some heuristic solvers, but it all gets very complicated.

The mpmath library has support for arbitrary-precision floating-point numbers, and supports matrix algebra: http://mpmath.googlecode.com/svn/tags/0.17/doc/build/matrices.html
Using sympy to do the computation exactly is then a second option.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.