Symbolic matrix multiplication on GPU using sympy?

Symbolic matrix multiplication on GPU using sympy? - python

I have a matrix M with approximately 300 rows and columns. Each entry contains a symbolic expression, and there are around 40 symbols in total. Doing something like M*M can take a long time (hours). Is there a way to do this symbolic matrix multiplication on a gpu using sympy, or more generally in python?

You're in luck! I just levelled up on this a few minutes ago. The optimal solution for parallelizing SymPy expressions on [G|T]PU hardware is (as of mid-2021) Aesara, a well-maintained fork of the now-defunct Theano package foolishly discontinued in the mistaken belief that Tensorflow was better. Spoiler alert: it wasn't.
Since the OP sadly failed to supply a minimal length example (MLE) for us to mutilate, we'll just copypasta a mostly useless example demonstrating the core API concepts:
# SymPy -> Aesara translater, bundled with SymPy.
from sympy.printing.aesaracode import aesara_function
# Arbitrary SymPy expression.
from sympy import sin
from sympy.abc import x
expr = sin(x)/x
# GPU-parallelized Aesara function compiled from that expression! Yup.
f = aesara_function([x], [expr])
Generally speaking, the optimal solution is whatever SymPy's "Numeric Computation" documentation says it is. Currently, that means Aesara; since data science in Python changes faster than a King Cobra biting the village hand that feeds it, everything has probably already changed by the time you read this. Since the aforementioned documentation is surprisingly intelligible, go there first instead. That's your optimum landing page for all things optimizing SymPy.

Related

Is there a way to see Scipy.Special.Gamma() function implementation in the source?

I've tried to write functions to calculate special functions (e.g. exponential, gamma, erf, etc), but to do the sum or product operations, I used while-loop with 10k turns. This is very time-consuming.
Then, I realized that scipy.special.gamma() function is significantly faster than my while-loop and I wanted to see and study the implementation/algorithm of the gamma function, but I couldn't find the source codes on the internet.
Is there a way to see the source codes of Scipy?

SciPy's source code is hosted on GitHub here: https://github.com/scipy/scipy
For floating-point arguments, SciPy uses CEPHES's implementation of gamma: https://github.com/scipy/scipy/blob/main/scipy/special/cephes/gamma.c
For complex arguments, SciPy uses a Cython implementation: https://github.com/scipy/scipy/blob/main/scipy/special/_loggamma.pxd

Solving homogeneous linear systems using Sympy

I'm teaching myself linear algebra, and I'm trying to learn the corresponding Numpy and Sympy code alongside it.
My book presented the following matrix:
example1 = Matrix([[3,5,-4,0],[-3,-2,4,0],[6,1,-8,0]])
with the instructions to determine if there is a nontrivial solution. The final solution would be x = x3 * Matrix([[4\3],[0],[1]]). (Using Jupyter's math mode, I used the following to represent the solution:)
$$\pmb{x} =
\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix} =
\begin{bmatrix}\frac{4}{3}x_3\\0\\x_3\end{bmatrix} =
x_3\begin{bmatrix}\frac{4}{3}\\0\\1\end{bmatrix} \\
= x_3\pmb{v} \text{, where }\pmb{v} = \begin{bmatrix}\frac{4}{3}\\0\\1\end{bmatrix}$$
How can I now solve this in Sympy? I've looked through the documentation, but I didn't see anything, and I'm at a bit of a loss. I know that errors tend to be thrown for free variables. Is there a way to determine nontrivial solutions and the corresponding general solution using Sympy, considering that nontrivial solutions are reliant upon free variables? Or is np.linalg generally more preferred for this type of problem?

This is a linear system so
>>> linsolve(Matrix([[3,5,-4,0],[-3,-2,4,0],[6,1,-8,0]]))
FiniteSet((4*tau0/3, 0, tau0))
The tau0 is the free parameter that you refer to as x3.

sympy compiling functions with large matrices

I have been using sympy to work with systems of differential equations. I write the equations symbolically, use autowrap to compile them through cython, and then pass the resulting function to the scipy ODE solver. One of the major benefits of doing this is that I can solve for the jacobian symbolically using the sympy jacobian function, compile it, and it to the ODE solver as well.
This has been working great for systems of about 30 variables. Recently I tried doing it with 150 variables, and what happened was that I ran out of memory when compiling the jacobian function. This is on Windows with anaconda and the microsoft Visual C++ 14 tools for python. Basically during compilation of the jacobian, which is now a 22000-element vector, memory usage during the linking step went up to about 7GB (on my 8GB laptop) before finally crashing out.
Does someone have some suggestions before I go and try on a machine with more memory? Are other operating systems or other C compilers likely to improve the situation?
I know lots of people do this type of work, so if there's an answer, it will be beneficial to a good chunk of the community.
Edit: response to some of Jonathan's comments:
Yes, I'm fully aware that this is an N^2 problem. The jacobian is a matrix of all partial derivatives, so it will have size N^2. There is no real way around this scaling. However, a 22000-element array is not nearly at the level that would create a memory problem during runtime -- I only have the problem during compilation.
Basically there are three levels that we can address this at.
1) solve the ODE problem without the jacobian, or somehow split up the jacobian to not have a 150x150 matrix. That would address the very root, but it certainly limits what I can do, and I'm not yet convinced that it's impossible to compile the jacobian function
2) change something about the way sympy automatically generates C code, to split it up into multiple chunks, use more functions for intermediate expressions, to somehow make the .c file smaller. People with more sympy experience might have some ideas on this.
3) change something about the way the C is compiled, so that less memory is needed.
I thought that by posting a separate question more oriented around #3 (literal referencing of large array -- compiler out of memory) , I would get a different audience answering. That is in fact exactly what happened. Perhaps the answer to #3 is "you can't" but that's also useful information.

Following a lot of the examples posted at http://www.sympy.org/scipy-2017-codegen-tutorial/ I was able to get this to compile.
The key things were
1) instead of using autowrap, write the C code directly with more control over it. Among other things, this allows passing the argument list as a vector instead of expanding it. This took some effort to get working (setting up the compiler flags through distutils, etc, etc) but in the end it worked well. Having the repo from the course linked above as an example helped a lot.
2) using common subexpression elimination (sympy.cse) to dramatically reduce the size of the expressions for the jacobian elements.
(1) by itself didn't do that much to help in this case (although I was able to use it to vastly improve performance of smaller models). The code was still 200 MB instead of the original 300 MB. But combining it with (2) (cse) I was able to get it down to a meager 1.7 MB (despite 14000 temporary variables).
The cse takes about 20-30 minutes on my laptop. After that, it compiles quickly.

Slow Down SymPy's Computations into Smaller Steps

I'm playing around with SymPy and it is very powerful. However, I would like to get it to 'slow down' and solve pieces of an equation at a time instead of most of the equation. For instance, given an input string equation (assuming the correct form) like
9x-((17-3)(4x)) - 8(34x)
I would like to first solve
9x-((14)(4x)) - 8(34x)
And then
9x-(56x) - 8(34x)
and then
9x-(56x) - 272x
And so on.
Another example,
from sympy import *
s = (30*(5*(5-10)-10*x))+10
s2 = expand(s, basic=False)
Gives me -300*x - 740 in one step, and I just want a single * done at a time

Reading the ideas document produced as a result of the Google Summer of Code, this appears to be something yet to be added to the library. As it stands there is no way of doing this for your example, without completely coding something yourself.
The issue of converting algorithms that are not equivalent to human workings, into discrete steps, is discussed and highlighted in the above document. I'm not sure if that'd be an issue in the implementation of expansion, but it's certainly an issue for other algorithms, which machines compute differently for reasons of efficiency.
tl;dr This library doesn't support step-by-step breakdowns for your example. Only the manualintegrate function currently has step-by-step workings.

How do I do matrix computations in python without rounding?

I have some integer matrices of moderate size (a few hundred rows). I need to solve equations of the form Ax = b where b is a standard basis vector and A is one of my matrices. I have been using numpy.linalg.lstsq for this purpose, but the rounding errors end up being too significant.
How can I carry out an exact symbolic computation?
(PS I don't really need the code to be efficient; I'm more concerned about ease of coding.)

If your only option is to use free tools written in python, sympy might work, but it could well be simpler to use mathematica.

Note that if you're serious about your comment that you require your solution vector to be integer, then you're looking for something called the "integer least squares problem". Which is believed to be NP-hard. There are some heuristic solvers, but it all gets very complicated.

The mpmath library has support for arbitrary-precision floating-point numbers, and supports matrix algebra: http://mpmath.googlecode.com/svn/tags/0.17/doc/build/matrices.html
Using sympy to do the computation exactly is then a second option.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.