DGESDD Lapack routine loading from numpy failed

DGESDD Lapack routine loading from numpy failed - python

I'm having trouble using the pinv function from the numpy.linalg module. I want to invert a rectangular matrix A:
try:
Binv = np.linalg.pinv(A)
except:
print("an error occurs")
When I run the code no exception is raised, but in my Python prompt the following red text appears: init_dgesdd failed init.
However when I use my code with other matrices in other contexts (different shapes, different conditioning values...) it works fine.

After investigation of the error, it seems to come from memory issues. When I use a matrix with a (105 x 177144) shape, it works. But when I use a matrix with a (105 x 178668) shape, it does not work.
Moreover, a quik look at the numpy.linalg.umath_linalg.c.src code shows that the error mentionned in my previous post raised when the memory allocation of the memory buffer failed. This memory buffer is used to store the U, S, VT and all the intermediate arrays needed during the svd computation.

Related

How to raise an exception for a tensorflow out of memory error?

I am running several tensorflow inferences using sess.run() in a loop and it happens that some inferences are too heavy for my GPU.
I get errors like :
2019-05-23 15:37:49.582272: E tensorflow/core/common_runtime/executor.cc:623]
Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [306] and type float
I would like to be able to catch these specific OutOfMemory errors but not other errors (which may be due to a wrong input format or a corrupted graph.)
Obviously, a structure similar to :
try:
sess.run(node_output, feed_dict={node_input : value_input})
except:
do_outOfMemory_specific_stuff()
does not work since other kind of errors will lead to a call to the do_outOfMemory_specific_stuff function.
Any idea how to catch these OutOfMemory errors ?

You should be able to catch it via:
...
except tf.errors.ResourceExhaustedError as e:
...
according to this documentation.

Why is this numba.cuda lookup table implementation failing?

I'm trying to implement an transform which at some stage in it has a lookup table < 1K in size. This seems to me like it shouldn't pose a problem to a modern graphics card.
But the code below is failing with an unknown error:
from numba import cuda, vectorize
import numpy as np
tmp = np.random.uniform( 0, 100, 1000000 ).astype(np.int16)
tmp_device = cuda.to_device( tmp )
lut = np.arange(100).astype(np.float32) * 2.5
lut_device = cuda.to_device(lut)
#cuda.jit(device=True)
def lookup(x):
return lut[x]
#vectorize("float32(int16)", target="cuda")
def test_lookup(x):
return lookup(x)
test_lookup(tmp_device).copy_to_host() # <-- fails with cuMemAlloc returning UNKNOWN_CUDA_ERROR
What am I doing against the spirit of numba.cuda?
Even replacing lookup with the following simplified code results in the same error:
#cuda.jit(device=True)
def lookup(x):
return x + lut[1]
Once this error occurs, I am essentially no longer able to utilize the cuda context at all. For instance, allocating a new array via cuda.to_device results in a:
numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemAlloc results in UNKNOWN_CUDA_ERROR
Running on: 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
Driver Version: 390.25
numba: 0.33.0

The above code is fixed by modifying the part in bold:
#cuda.jit(device=True)
def lookup(x):
lut_device = cuda.const.array_like(lut)
return lut_device[x]
I ran multiple variations of the code including simply touching the lookup table from within this kernel, but not using its output. This combined with #talonmies' assertion that UNKNOWN_CUDA_ERROR usually occurs with invalid instructions, I thought that perhaps there was a shared memory constraint that was causing the issue.
The above code makes the whole thing work. However, I still don't understand why in a profound way.
If anyone knows and understands why, please feel free to contribute to this answer.

Parallel Scipy COO Matrix Computations

I am trying to calculate sparse matrix calculations using scipy for an algorithm that require intensive dependent computations(PageRank) on very large RDF datasets. I want to use multiple cores for the scipy calculation within the following code
F = sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape'])
W = sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape'])
P = sparse.bmat([[None, W], [F, None]])
previous = np.ones(n)/n
ones = np.ones(n)/n
while error > epsilon:
tmp = np.array(previous)
previous = damping*P.T.dot(previous) + (1-damping)*ones
error = np.linalg.norm(tmp - previous)
if(printerror):
print(error)
I have searched every possible answer I could find and I tried integrating the mkl(anaconda build) within the code but the performance on multiple cores does not seem to scale up. I have come to an understanding that the scipy call csr.h does not make use of BLAS call, I am wondering whether I need to make changes and replace the call to csr_matvec in from scipy/sparsetools with an appropriate Sparse BLAS call since MKL has those and then link scipy to mkl. Am I understanding something wrong or missing something. I would really appreciate some help in the matter. One similar question is here Thanks!!

Theano matrix multiplication

I have a piece of code that is supposed to calculate a simple
matrix product, in python (using theano). The matrix that I intend to multiply with is a shared variable.
The example is the smallest example that demonstrates my problem.
I have made use of two helper-functions. floatX converts its input to something of type theano.config.floatX
init_weights generates a random matrix (in type floatX), of given dimensions.
The last line causes the code to crash. In fact, this forces so much output on the commandline that I can't even scroll to the top of it anymore.
So, can anyone tell me what I'm doing wrong?
def floatX(x):
return numpy.asarray(x,dtype=theano.config.floatX)
def init_weights(shape):
return floatX(numpy.random.randn(*shape))
a = init_weights([3,3])
b = theano.shared(value=a,name="b")
x = T.matrix()
y = T.dot(x,b)
f = theano.function([x],y)

This work for me. So my guess is that you have a problem with your blas installation. Make sure to use Theano development version:
http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
It have better default for some configuration. If that do not fix the problem, look at the error message. There is main part that is after the code dump. After the stack trace. This is what is the most useful normally.
You can disable direct linking by Theano to blas with this Theano flag: blas.ldflags=
This can cause slowdown. But it is a quick check to confirm the problem is blas.
If you want more help, dump the error message to a text file and put it on the web and link to it from here.

Python/Numpy MemoryError

Basically, I am getting a memory error in python when trying to perform an algebraic operation on a numpy matrix. The variable u, is a large matrix of double (in the failing case its a 288x288x156 matrix of doubles. I only get this error in this huge case, but I am able to do this on other large matrices, just not this big). Here is the Python error:
Traceback (most recent call last):
File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc
t perim erosion flattop\SwSim.py", line 121, in __init__
self.mainSimLoop()
File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc
t perim erosion flattop\SwSim.py", line 309, in mainSimLoop
u = solver.solve_cg(u,b,tensors,param,fdHold,resid) # Solve the left hand si
de of the equation Au=b with conjugate gradient method to approximate u
File "S:\3D_Simulation_Data\Patient SPM Segmentation\20 pc
t perim erosion flattop\conjugate_getb.py", line 47, in solv
e_cg
u = u + alpha*p
MemoryError
u = u + alpha*p is the line of code that fails.
alpha is just a double, while u and r are the large matrices described above (both of the same size).
I don't know that much about memory errors especially in Python. Any insight/tips into solving this would be very appreciated!
Thanks

Rewrite to
p *= alpha
u += p
and this will use much less memory. Whereas p = p*alpha allocates a whole new matrix for the result of p*alpha and then discards the old p; p*= alpha does the same thing in place.
In general, with big matrices, try to use op= assignment.

Another tip I have found to avoid memory errors is to manually control garbage collection. When objects are deleted or go our of scope, the memory used for these variables isn't freed up until a garbage collection is performed. I have found with some of my code using large numpy arrays that I get a MemoryError, but that I can avoid this if I insert calls to gc.collect() at appropriate places.
You should only look into this option if using "op=" style operators etc doesn't solve your problem as it's probably not the best coding practice to have gc.collect() calls everywhere.

Your matrix has 288x288x156=12,939,264 entries, which for double could come out to 400MB in memory. numpy throwing a MemoryError at you just means that in the function you called the memory needed to perform the operation wasn't available from the OS.
If you can work with sparse matrices this might save you a lot of memory.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

DGESDD Lapack routine loading from numpy failed - python

Related

How to raise an exception for a tensorflow out of memory error?

Why is this numba.cuda lookup table implementation failing?

Parallel Scipy COO Matrix Computations

Theano matrix multiplication

Python/Numpy MemoryError

Categories

Resources