I am using sympy to do symbolic matrix multiplication of 13 2x2 matrices (for optics). The resulting matrix is of course a 2x2 matrix but is huge.
I am using pprint() in order to display stuff in a nice manner.
Problem is that pprint is basically "splitting" the matrix over many rows making it basically unreadable. To put things into perspective, below is the first element of the matrix as it is pretty printed, so imagine how the whole thing is going to look like.
Any tips, tricks to pretty print the matrix in a continuous way?
Many thanks,
P.S; I am using jupyter notebook
This is probably a little late. After over an hour searching for this tiny problem, I finally found a fix: As stated in their internal documentation for pretty_print (pprint is essentially a wrapper for this):
num_columns : int or None, optional (default=None)
Number of columns before line breaking (default to None which reads
the terminal width), useful when using SymPy without terminal.
I would recommend setting the limit to something you will never exceed, e.g. 10,000 or even 100,000. This at least worked for me:
pprint(expression, num_columns=10_000)
Related
I am working on a project related to gravitational lensing, for which I need to evaluate the confluent hypergeometric function 1F1(a,b,z) for an array z of length ~ 10^8 complex points, a = 1+0.48j and b = 1. I am looking for an efficient way to evaluate this on large array sizes. The scipy implementation is fast but does not accept complex arguments for a and b.
mpmath seems to be the best way to calculate 1F1 for complex parameters but mpmath.hyp1f1 does not accept array values. The best workaround I found for this was to use np.vectorize or np.frompyfunc to allow passing a NumPy array as a parameter. However, this is extremely slow and would take days to execute (even with gmpy2 installed). I assume this is because mpmath functions are always slow on large array sizes.
a nonpython implementation would be fine as well, as long as I can somehow save the result on disk and read it into my python code. I have seen some implementations (for example https://www.math.ucla.edu/~mason/research/pearson_final.pdf) which could possibly work but I'm not sure.
Another possible way would be to interpolate the function
(consecutive points in my input array are extremely close) but I'm not sure what would be the best way to do that.
Thanks!
I was having a very similar problem than you have.
I figured out that the mpmath package has a "hidden" set of function with (only) float precision, which one can access by writing fp. upfront. This does not exist for hyp1f1 but for the more general hyper. Meaning there is a fp.hyper in the mpmath package which is with fp.hyper([a],[b],z) equivalent to hyper1f1(a,b,z), but is a lot faster.
If you vectorize this with np.vectorize this should make your calculation substansially faster.
Disclaimer: I got an error message saying that some complex value is converted to real by dropping the imaginary part when evaluating this, but so far the results i have gotten seem sensible and compatible to the hyper1f1(a,b,z) values.
Added: It seems that fp.hyper does not like getting numpy datatypes even if they are scalars, as in the case of a,b,z beeing numpy scalars (for example one element of an numpy array) it will simply return 1 without giving an error message independent of the actual input. If you use np.vectorize however everything should be fine.
Eitherway: Use at own risc.
My problem is to perform 3 matrix multiplications on a 3D numpy array A too large to fit in a single processor. In tensorial form I want A_ijk B_km C_jn D_ip (B, C, and D can all fit in memory). I want to know if dask is appropriate for this task (or if another tool might be more suited).
I believe the best approach is to split this operation into each multiplication, and make sure that they are all local. This link has a really useful diagram that summarises what I'm talking about http://www.2decomp.org/1d_mode.html.
In more detail: First, to do A_ijk B_km, I should distribute A over the first two axes, and perform the matrix multiplication over each pencil locally (the first step in the diagram).
Then, I need to transpose the array, making the j axis local to each processor (and splitting over the k (now m) axis), to then perform the next multiplication. (So going from the first to the second step in the diagram). This is where I wonder if dask could help.
I'm aware that this can be done in principle using mpi4py, but the steps are pretty non-trivial, whereas dask arrays have helpful rechunk and transpose methods, which feel relevant to this application.
Does this seem like something well-suited to dask?
If not, is anyone aware of any python libraries that can perform these steps? I know that fftw has routines for doing just this, but I don't know how to write the C-code necessary, or how to get it to interface with python and numpy.
Thanks for any help.
For anyone else in the future, mpi4py does have a transpose method. But it's called Alltoall/Alltoallv. It's not explained in the documentation or tutorial on mpi4py. I found out about it at another tutorial: https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:mpi4py.
Dask implements einsum, which may be what you are after, and there is, of course matmul, if you want to write out the operation. So long as your large matrix A is a Dask array, with reasonable chunk sizes, Dask will parcel out your work without running out of memory.
I have a numpy array of values and I wanted to scale (zoom) it. With floats I was able to use scipy.ndimage.zoom but now my array contains complex values which are not supported by scipy.ndimage.zoom. My workaround was to separate the array into two parts (real and imaginary) and scale them independently. After that I add them back together. Unfortunately this produces a lot of tiny artifacts in my 'image'. Does somebody know a better way? Maybe there also exists a python library for this? I couldn't find one.
Thank you!
This is not a good answer but it seems to work quite well. Instead of using the default parameters for the zoom method, I'm using order=0. I then proceed to deal with the real and imaginary part separately, as described in my question. This seems to reduce the artifacts although some smaller artifacts remain. It is by no means perfect and if somebody has a better answer, I would be very interested.
I am trying to generate a few very large arrays, and at least one is ending up being singular, which is made obvious by this familiar error message:
File "C:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix
Of course I do not want my array to be singular, but I am more interested in determining WHY my array is singular. What I mean by this is that I would like to have a way to answer the following questions without manually checking each entry:
Is the array square? (I believe this is returned by a separate error message, which is convenient, but I'll include this as a singularity property anyway)
Are any rows populated only by zeros?
Are any columns populated only by zeros?
Are any rows not linearly independent of all other rows?
For relatively small arrays, the first two conditions are easily answered by visual inspection. However, because my arrays are substantially large, I do not want to have to go in and manually check each array element to see if any of those conditions are met.
I tried pulling up the linalg.py script to see if I could see how it determines a matrix to be singular, but I could not tell how it determines a matrix to be singular.
(this paragraph was edited for clarity)
I also tried searching for info online, and nothing seemed to be of help. Most topics seemed to only answer some form of the following questions/objectives: 1) "I want Python to tell me if my matrix is singular" or 2) why is Python giving me this error message". Because I already know that my matrix/matrices are singular, neither of these two questions are of importance to me.
Again, I am not looking for an answer along the lines of, "Oh, well this particular matrix is singular because . . .". I am looking for a method I can use immediately on ANY singular matrix to determine (especially for large arrays) what is causing the singularity.
Is there a built-in Python function that does this, or is there some other relatively simple way to do this before I try to create a function that will do this for me?
Singular matrices have at least one eigenvalue equal to zero. You can create a diagonalizable singular matrix by starting from its eigenvalue decomposition:
A = V D V^{-1}
D is the diagonal matrix of eigenvalues. So create any matrix V, the diagonal matrix D that has at least one zero in the diagonal, and then A will be singular.
The traditional way of checking is by computing an SVD. This is what the function numpy.linalg.matrix_rank uses to compute the rank, and you can then check if matrix_rank(M) == M.shape[0] (assuming a square matrix).
For more information, check out this excellent answer to a similar question for Matlab users.
The rank of the matrix will tell you how many rows aren't zero or linear combinations, but not specifically which ones. It's a relatively fast operation, so it might be useful as a first-pass check.
I'm using python to work with large-ish (approx 2000 x 2000) matrices, where each I, J point in the matrix represents a single pixel.
The matrices themselves are sparse (ie a substantial portion of them will have zero values), but when they are updated they tend to be increment operations, to a large number of adjacent pixels in a rectangular 'block', rather than random pixels here or there (a property i do not currently use to my advantage..).
Afraid a bit new to matrix arithmetic, but I've looked into a number of possible solutions, including the various flavours of scipy sparse matrices. So far co-ordinate (COO) matrices seem to be the most promising.
So for instance where I want to increment one block shape, I'd have to do something along the lines of:
>>> from scipy import sparse
>>> from numpy import array
>>> I = array([0,0,0,0])
>>> J = array([0,1,2,3])
>>> V = array([1,1,1,1])
>>> incr_matrix = sparse.coo_matrix((V,(I,J)),shape=(100,100))
>>> main_matrix += incr_matrix #where main_matrix was previously defined
In the future, i'd like to have a richer pixel value representation in anycase (tuples to represent RGB etc), something that numpy array doesnt support out of the box (or perhaps I need to use this).
Ultimately i'll have a number of these matrices that I would need to do simple arithmitic on, and i'd need the code to be as efficient as possible -- and distributable, so i'd need to be able to persist and exchange these objects in a small-ish representation without substantial penalties. I'm wondering if this is the right way to go, or should I be looking rolling my own structures using dicts etc?
The general rule is, get the code working first, then optimize if needed...
In this case, use a normal numpy 2000x2000 array, or 2000x2000x3 for RGB. This will be much easier and faster to work with, is only a small memory requirement, and has many other advantages, for example, you can use the standard image processing tools, etc.
Then, if needed, "to persist and exchange these objects", you can just compress them using gzip, pytables, jpeg, or whatever, but there's no need to limit your data manipulation based storage requirements.
This way you get both faster processing and better compression.
I would say, yes, this is the way to go. Definitely over building something out of dictionaries! When building a "vector", array, then use a structured array, i.e. define your own dtype:
rgbtype = [('r','uint8'),('g','uint8'),('b','uint8')]
when incrementing your blocks, it will look something like this:
main_matrix['r'][blk_slice] += incr_matrix['r']
main_matrix['g'][blk_slice] += incr_matrix['g']
main_matrix['b'][blk_slice] += incr_matrix['b']
Update:
It looks like you can't do matrix operations with a coo_matrix, they exist simply as a convenient way to populate a sparse matrix. You have to convert them to another (sparse) matrix type before doing the updates. documentation
You might want to consider looking into a quadtree as an implementation. The quadtree structure is pretty efficient at storing sparse data, and has the added advantage that if you're working with structures composed of lots of blocks of similar data the representation can be very compact. I'm not sure if this will be particularly applicable to what you're doing, since I don't know what you mean by "working in blocks," but it's certainly worth checking out as an alternative sparse matrix implementation.