Issue when trying to invert an array of 2D matrix - python

I have a simple issue about the inversion of a matrix 4x4, especially when I try to do it with a loop on integ_prec indices (integ_prec = 6 here and dimBlocks = 4).
Here is the code snippet :
# Declaration of inverse cross matrix
invCrossMatrix = np.zeros((dimBlocks,dimBlocks,integ_prec,integ_prec))
# Build observables covariance matrix
arrayFullCross_vec = buildObsCovarianceMatrix_vec(k_ref, mu_ref, ir)
# Invert 4x4 covariance matrix
for r_p in range(integ_prec):
for s_p in range(integ_prec):
invCrossMatrix[:][:][r_p][s_p] = np.linalg.inv(arrayFullCross_vec[:][:][r_p][s_p])
The function buildObsCovarianceMatrix_vec returns a 4D array :
def buildObsCovarianceMatrix_vec(k_ref, mu_ref, ir):
arrayCrossTemp = np.zeros((dimBlocks,dimBlocks,integ_prec,integ_prec))
... processing
return arrayCrossTemp
But I get systematically an error when inversion ocurs :
File "GC_forecast_8bins_base_Mpc_DESI_dev.py", line 1345, in integ_LU_cross
function_A = aux_fun_LU_cross_vec(ecs, way, I1[0], I1[1], I1[2])
File "GC_forecast_8bins_base_Mpc_DESI_dev.py", line 1216, in aux_fun_LU_cross_vec
invCrossMatrix[r_p][s_p][:][:] = np.linalg.inv(arrayFullCross_vec[:][:][r_p][s_p])
File "/Users/fab/Library/Python/2.7/lib/python/site-packages/numpy/linalg/linalg.py", line 551, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/Users/fab/Library/Python/2.7/lib/python/site-packages/numpy/linalg/linalg.py", line 97, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
With another version of my code (with scalar values), everything works fine.
I expect to invert a 4x4 matrix at each iteration of loop.
Is the syntax nvCrossMatrix[:][:][r_p][s_p] = np.linalg.inv(arrayFullCross_vec[:][:][r_p][s_p]correct ?

Related

Python--calculate normalized probability of a value given a list of samples

So, as the title says, I'm trying to calculate the probability of a value given a list of samples, preferably normalized so the probability is 0<p<1. I found this answer on the topic from about 6 years ago, which seemed promising. To test it, I implemented the example used in the first reply (edited for brevity):
import numpy as np
from sklearn.neighbors import KernelDensity
from scipy.integrate import quad
# Generate random samples from a mixture of 2 Gaussians
# with modes at 5 and 10
data = np.concatenate((5 + np.random.randn(10, 1),
10 + np.random.randn(30, 1)))
x = np.linspace(0, 16, 1000)[:, np.newaxis]
# Do kernel density estimation
kd = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(data)
# Get probability for range of values
start = 5 # Start of the range
end = 6 # End of the range
probability = quad(lambda x: np.exp(kd.score_samples(x)), start, end)[0]
However, this approach throws the following error:
Traceback (most recent call last):
File "prob test.py", line 44, in <module>
probability = quad(lambda x: np.exp(kd.score_samples(x)), start, end)[0]
File "/usr/lib/python3/dist-packages/scipy/integrate/quadpack.py", line 340, in quad
retval = _quad(func, a, b, args, full_output, epsabs, epsrel, limit,
File "/usr/lib/python3/dist-packages/scipy/integrate/quadpack.py", line 448, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "prob test.py", line 44, in <lambda>
probability = quad(lambda x: np.exp(kd.score_samples(x)), start, end)[0]
File "/usr/lib/python3/dist-packages/sklearn/neighbors/_kde.py", line 190, in score_samples
X = check_array(X, order='C', dtype=DTYPE)
File "/usr/lib/python3/dist-packages/sklearn/utils/validation.py", line 545, in check_array
raise ValueError(
ValueError: Expected 2D array, got scalar array instead:
array=5.5.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I'm not sure how to reshape the distribution when its already inside the lambda function, and, in any case, I'm guessing this is happening because Scikit-Learn has been updated in the 6 years since this answer was written. What's the best way to work around this issue to get the probability value?
Thanks!
As said in the library:
score_samples(X): X array-like of shape (n_samples, n_features)
Therefore, you should pass an array-like and not a scalar:
probability = quad(lambda x: np.exp(kd.score_samples(np.array([[x]]))), start, end)
or:
probability = quad(lambda x: np.exp(kd.score_samples(np.array([x]).reshape(-1,1))), start, end)

Creating an ITK displacement field image from a numpy array

In my PhD project I analyze 3D microCT datasets of lung tissue samples. One topic is the simulation of an atelectasis by warping the image using ITK Python. In order to achieve that (with the WarpImageFilter or the ResampleImageFilter in ITK) I have to create a displacement vector field. Therefore, I have to convert a 3D numpy array into an itk image using the GetImageFromArray function. The resulting output should be in a format which the ResampleImageFilter or WarpImageFilter can work with:
Here´s my code:
array1 = []
for i in range (-5,5):
for j in range(-5,5):
for k in range(-5,5):
if i == 0 and j == 0 and k == 0:
array1.append([0, 0, 0])
else:
x = (float(i)/float(i**2 + j**2 + k**2))
y = (float(j)/float(i**2 + j**2 + k**2))
z = (float(k)/float(i**2 + j**2 + k**2))
array1.append([x, y, z])
displacementFieldFileName = itk.image_from_array(np.reshape(array1, (10,10,10,3)), is_vector = True)
The last line shows the conversion from a numpy array into a 3D ITK vector image format which is needed by the filters mentioned above. However, I receive the following error message:
Traceback (most recent call last):
File “Test_Displacement.py”, line 39, in
displacementFieldFileName = itk.image_from_array(np.reshape(array1, (10,10,10,3)), is_vector = True)
File “/XXXX/YYYY/.local/lib/python2.7/site-packages/itkExtras.py”, line 297, in GetImageFromArray
return _GetImageFromArray(arr, “GetImageFromArray”, is_vector)
File “/XXXX/YYYY/.local/lib/python2.7/site-packages/itkExtras.py”, line 291, in _GetImageFromArray
templatedFunction = getattr(itk.PyBuffer[ImageType], function)
File “/XXXX/YYYY/.local/lib/python2.7/site-packages/itkTemplate.py”, line 340, in getitem
raise TemplateTypeError(self, tuple(cleanParameters))
itkTemplate.TemplateTypeError: itk.PyBuffer is not wrapped for input type itk.Image[itk.Vector[itk.D,3],3].
A similar topic can be found here:
https://discourse.itk.org/t/importing-image-from-array-and-axis-reorder/1192
I already tried using dtype=np.float32 and .astype(np.float32) to specify the float data type but this leads to another error:
File "Test_Displacement.py", line 59, in <module>
fieldReader.SetFileName(displacementFieldFileName)
TypeError: in method 'itkImageFileReaderIF3_SetFileName', argument 2 of type 'std::string const &'
How can the displacement field created properly? Any help will be highly appreciated!
Alex
It seems like it's asking for:
itk.Image[itk.Vector[itk.D,3],3]
Not a numpy array. Or maybe your numpy array has the wrong dimensionality.

python sparse gmres messes with input arguments

I have a simple code to solve a sparse linear system using scipy.sparse.linalg.gmres
W, S = load_data()
M = normalize(W.T.astype('float64'),'l1')
S = normalize(S.astype('float64'),'l1')
rhs = S[cat_id,:].T
print M.shape
print rhs.shape
p = gmres(M, rhs)
function load_data loads two sparse matrices from matlab's .mat files and omitted.
The output is surprising:
(150495, 150495)
(150495, 1)
Traceback (most recent call last):
File "explain.py", line 54, in <module>
pr(1)
File "explain.py", line 42, in pr
p = gmres(M, rhs)
File "<string>", line 2, in gmres
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", line 85, in non_reentrant
return func(*a, **kw)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", line 418, in gmres
A,M,x,b,postprocess = make_system(A,M,x0,b,xtype)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/utils.py", line 78, in make_system
raise ValueError('A and b have incompatible dimensions')
ValueError: A and b have incompatible dimensions
But I've run gmres in accordance with documentation
A : {sparse matrix, dense matrix, LinearOperator}
The real or complex N-by-N matrix of the linear system.
b : {array, matrix}
Right hand side of the linear system. Has shape (N,) or (N,1).
I simply don't understand what is wrong with this code and would like any ideas.
The argument b of gmres must not be a sparse matrix; it can be a numpy array or matrix. Try
p = gmres(M, rhs.A)

scipy.optimize.curvefit() - array must not contain infs or NaNs

I am trying to fit some data to a curve in Python using scipy.optimize.curve_fit. I am running into the error ValueError: array must not contain infs or NaNs.
I don't believe either my x or y data contain infs or NaNs:
>>> x_array = np.asarray_chkfinite(x_array)
>>> y_array = np.asarray_chkfinite(y_array)
>>>
To give some idea of what my x_array and y_array look like at either end (x_array is counts and y_array is quantiles):
>>> type(x_array)
<type 'numpy.ndarray'>
>>> type(y_array)
<type 'numpy.ndarray'>
>>> x_array[:5]
array([0, 0, 0, 0, 0])
>>> x_array[-5:]
array([2919, 2965, 3154, 3218, 3461])
>>> y_array[:5]
array([ 0.9999582, 0.9999163, 0.9998745, 0.9998326, 0.9997908])
>>> y_array[-5:]
array([ 1.67399000e-04, 1.25549300e-04, 8.36995200e-05,
4.18497600e-05, -2.22044600e-16])
And my function:
>>> def func(x,alpha,beta,b):
... return ((x/1)**(-alpha) * ((x+1*b)/(1+1*b))**(alpha-beta))
...
Which I am executing with:
>>> popt, pcov = curve_fit(func, x_array, y_array)
resulting in the error stack trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 426, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 338, in leastsq
cov_x = inv(dot(transpose(R),R))
File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 285, in inv
a1 = asarray_chkfinite(a)
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 590, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
I'm guessing the error might not be with respect to my arrays, but rather an array created by scipy in an intermediate step? I've had a bit of a dig through the relevant scipy source
files, but things get hairy pretty quickly debugging the problem that way. Is there something obvious I'm doing wrong here? I've seen casually mentioned in other questions that sometimes certain initial parameter guesses (of which I currently don't have any explicit) might result in these kind of errors, but even if this is the case, it would be good to know a) why that is and b) how to avoid it.
Why it is failing
Not your input arrays are entailing nans or infs, but evaluation of your objective function at some X points and for some values of the parameters results in nans or infs: in other words, the array with values func(x,alpha,beta,b) for some x, alpha, beta and b is giving nans or infs over the optimization routine.
Scipy.optimize curve fitting function uses Levenberg-Marquardt algorithm. It is also called damped least square optimization. It is an iterative procedure, and a new estimate for the optimal function parameters is computed at each iteration. Also, at some point during optimization, algorithm is exploring some region of the parameters space where your function is not defined.
How to fix
1/Initial guess
Initial guess for parameters is decisive for the convergence. If initial guess is far from optimal solution, you are more likely to explore some regions where objective function is undefined. So, if you can have a better clue of what your optimal parameters are, and feed your algorithm with this initial guess, error while proceeding might be avoided.
2/Model
Also, you could modify your model, so that it is not returning nans. For those values of the parameters, params where original function func is not defined, you wish that objective function takes huge values, or in other words that func(params) is far from Y values to be fitted.
Also, at points where your objective function is not defined, you may return a big float, for instance AVG(Y)*10e5 with AVG the average (so that you make sure to be much bigger than average of Y values to be fitted).
Link
You could have a look at this post: Fitting data to an equation in python vs gnuplot
Your function has a negative power (x^-alpha) this is the same as (1/x)^(alpha). If x is ever 0 your function will return inf and your curve fit operation will break, I'm surprised a warning/error isn't thrown earlier informing you of a divide by 0.
BTW why are you multiplying and dividing by 1?
I was able to reproduce this error in python2.7 like so:
from sklearn.decomposition import FastICA
X = load_data.load("stuff") #this sets X to a 2d numpy array containing
#large positive and negative numbers.
ica = FastICA(whiten=False)
print(np.isnan(X).any()) #this prints False
print(np.isinf(X).any()) #this prints False
ica.fit(X) #this produces the error:
Which always produces the Error:
/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py:58: RuntimeWarning: invalid value encountered in sqrt
return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
Traceback (most recent call last):
File "main.py", line 43, in <module>
ica()
File "main.py", line 18, in ica
ica.fit(X)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 523, in fit
self._fit(X, compute_sources=False)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 479, in _fit
compute_sources=compute_sources, return_n_iter=True)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 335, in fastica
W, n_iter = _ica_par(X1, **kwargs)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 108, in _ica_par
- g_wtx[:, np.newaxis] * W)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 55, in _sym_decorrelation
s, u = linalg.eigh(np.dot(W, W.T))
File "/usr/lib64/python2.7/site-packages/scipy/linalg/decomp.py", line 297, in eigh
a1 = asarray_chkfinite(a)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py", line 613, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
Solution:
from sklearn.decomposition import FastICA
X = load_data.load("stuff") #this sets X to a 2d numpy array containing
#large positive and negative numbers.
ica = FastICA(whiten=False)
#this is a column wise normalization function which flattens the
#two dimensional array from very large and very small numbers to
#reasonably sized numbers between roughly -1 and 1
X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
print(np.isnan(X).any()) #this prints False
print(np.isinf(X).any()) #this prints False
ica.fit(X) #this works correctly.
Why does that normalization step fix the error?
I found the eureka moment here: sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs"
What I think is happening is that numpy is being fed gigantic numbers and very tiny numbers, and inside it's tiny brain it's creating NaN's and Inf's. So it's a bug in the sklearn. The work around is to flatten your input data to the algorithm so that there are no very large or very small numbers.
Bad sklearn! NO biscuit!

Create a sparse diagonal matrix from row of a sparse matrix

I process rather large matrices in Python/Scipy. I need to extract rows from large matrix (which is loaded to coo_matrix) and use them as diagonal elements. Currently I do that in the following fashion:
import numpy as np
from scipy import sparse
def computation(A):
for i in range(A.shape[0]):
diag_elems = np.array(A[i,:].todense())
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
#...
#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csc")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')
What I see from the profile output is that most of the time is consumed by get_csr_submatrix function while extracting diag_elems. That makes me think that I use either inefficient sparse representation of initial data or wrong way of extracting row from a sparse matrix. Can you suggest a better way to extract a row from a sparse matrix and represent it in a diagonal form?
EDIT
The following variant removes bottleneck from the row extraction (notice that simple changing 'csc' to csr is not sufficient, A[i,:] must be replaced with A.getrow(i) as well). However the main question is how to omit the materialization (.todense()) and create the diagonal matrix from the sparse representation of the row.
import numpy as np
from scipy import sparse
def computation(A):
for i in range(A.shape[0]):
diag_elems = np.array(A.getrow(i).todense())
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
#...
#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csr")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')
If I create DIAgonal matrix from 1-row CSR matrix directly, as follows:
diag_elems = A.getrow(i)
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1])
then I can neither specify format="csc" argument, nor convert ith_diags to CSC format:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/profile.py", line 70, in run
prof = prof.run(statement)
File "/usr/local/lib/python2.6/profile.py", line 456, in run
return self.runctx(cmd, dict, dict)
File "/usr/local/lib/python2.6/profile.py", line 462, in runctx
exec cmd in globals, locals
File "<string>", line 1, in <module>
File "<stdin>", line 4, in computation
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/construct.py", line 56, in spdiags
return dia_matrix((data, diags), shape=(m,n)).asformat(format)
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/base.py", line 211, in asformat
return getattr(self,'to' + format)()
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/dia.py", line 173, in tocsc
return self.tocoo().tocsc()
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/coo.py", line 263, in tocsc
data = np.empty(self.nnz, dtype=upcast(self.dtype))
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/sputils.py", line 47, in upcast
raise TypeError,'no supported conversion for types: %s' % args
TypeError: no supported conversion for types: object`
Here's what I came up with:
def computation(A):
for i in range(A.shape[0]):
idx_begin = A.indptr[i]
idx_end = A.indptr[i+1]
row_nnz = idx_end - idx_begin
diag_elems = A.data[idx_begin:idx_end]
diag_indices = A.indices[idx_begin:idx_end]
ith_diag = sparse.csc_matrix((diag_elems, (diag_indices, diag_indices)),shape=(A.shape[1], A.shape[1]))
ith_diag.eliminate_zeros()
Python profiler said 1.464 seconds versus 5.574 seconds before. It takes advantage of the underlying dense arrays (indptr, indices, data) that define sparse matrices. Here's my crash course: A.indptr[i]:A.indptr[i+1] defines which elements in the dense arrays correspond to the non-zero values in row i. A.data is a dense 1d array of non-zero the values of A and A.indptr is the column where those values go.
I would do some more testing to make very certain this does the same thing as before. I only checked a few cases.

Categories

Resources