python sparse gmres messes with input arguments - python

I have a simple code to solve a sparse linear system using scipy.sparse.linalg.gmres
W, S = load_data()
M = normalize(W.T.astype('float64'),'l1')
S = normalize(S.astype('float64'),'l1')
rhs = S[cat_id,:].T
print M.shape
print rhs.shape
p = gmres(M, rhs)
function load_data loads two sparse matrices from matlab's .mat files and omitted.
The output is surprising:
(150495, 150495)
(150495, 1)
Traceback (most recent call last):
File "explain.py", line 54, in <module>
pr(1)
File "explain.py", line 42, in pr
p = gmres(M, rhs)
File "<string>", line 2, in gmres
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", line 85, in non_reentrant
return func(*a, **kw)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", line 418, in gmres
A,M,x,b,postprocess = make_system(A,M,x0,b,xtype)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/utils.py", line 78, in make_system
raise ValueError('A and b have incompatible dimensions')
ValueError: A and b have incompatible dimensions
But I've run gmres in accordance with documentation
A : {sparse matrix, dense matrix, LinearOperator}
The real or complex N-by-N matrix of the linear system.
b : {array, matrix}
Right hand side of the linear system. Has shape (N,) or (N,1).
I simply don't understand what is wrong with this code and would like any ideas.

The argument b of gmres must not be a sparse matrix; it can be a numpy array or matrix. Try
p = gmres(M, rhs.A)

Related

Pybats forecast error when using Poisson family distribution

I am building a timeseries PyBats model using a Poisson distribution to signify the distribution of observations. My model instantiation looks like this
model = define_dglm(
Y=data.actual.values,
X=None,
family="poisson",
k=1,
prior_length=8,
dates=data["month"],
ntrend=2,
seasPeriods=[],
seasHarmComponents=[],
nsamps=10000,
)
Where data.actual.values is a numpy array of integers. After instantiating the model, in order to forecast into the future with pybats I run
forecast_samples = model.forecast_path(k=steps_to_forecast, X=X_future, nsamps=10000)
and get the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/pybats/dglm.py", line 289, in forecast_path
return forecast_path_copula(self, k, X, nsamps, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pybats/forecast.py", line 211, in forecast_path_copula
return forecast_path_copula_sim(mod, k, lambda_mu, lambda_cov, nsamps, t_dist, nu)
File "/opt/conda/lib/python3.8/site-packages/pybats/forecast.py", line 326, in forecast_path_copula_sim
return np.array(list(map(lambda prior: mod.simulate_from_sampling_model(prior, nsamps),
File "/opt/conda/lib/python3.8/site-packages/pybats/forecast.py", line 326, in <lambda>
return np.array(list(map(lambda prior: mod.simulate_from_sampling_model(prior, nsamps),
File "/opt/conda/lib/python3.8/site-packages/pybats/dglm.py", line 477, in simulate_from_sampling_model
return np.random.poisson(rate, [nsamps])
File "mtrand.pyx", line 3573, in numpy.random.mtrand.RandomState.poisson
File "_common.pyx", line 824, in numpy.random._common.disc
File "_common.pyx", line 621, in numpy.random._common.discrete_broadcast_d
File "_common.pyx", line 355, in numpy.random._common.check_array_constraint
ValueError: lam value too large
I have tried converting my Y array to floats, and have tried replacing all 0 values with 1 and get the same error. What is causing this error?
The issue is in exceeding the maximum value allowed in numpy.random.poisson. It looks like any value larger than np.random.poisson(1E19) will cause this error.
A couple things you can try:
Use a longer prior length than 8 when defining the model. This will help produce more stable estimates of the coefficients. After defining your model, check what the coefficient mean vector (model.a) and covariance matrix (model.R) are, to make sure they're reasonable. If they're not, you can change them manually.
If some of your 'Y' values are truly that large, a Poisson model is probably not appropriate. I would suggest modeling log(Y) using the normal dlm model in Pybats.
I hope that this help!
Thanks,
Isaac

Issue when trying to invert an array of 2D matrix

I have a simple issue about the inversion of a matrix 4x4, especially when I try to do it with a loop on integ_prec indices (integ_prec = 6 here and dimBlocks = 4).
Here is the code snippet :
# Declaration of inverse cross matrix
invCrossMatrix = np.zeros((dimBlocks,dimBlocks,integ_prec,integ_prec))
# Build observables covariance matrix
arrayFullCross_vec = buildObsCovarianceMatrix_vec(k_ref, mu_ref, ir)
# Invert 4x4 covariance matrix
for r_p in range(integ_prec):
for s_p in range(integ_prec):
invCrossMatrix[:][:][r_p][s_p] = np.linalg.inv(arrayFullCross_vec[:][:][r_p][s_p])
The function buildObsCovarianceMatrix_vec returns a 4D array :
def buildObsCovarianceMatrix_vec(k_ref, mu_ref, ir):
arrayCrossTemp = np.zeros((dimBlocks,dimBlocks,integ_prec,integ_prec))
... processing
return arrayCrossTemp
But I get systematically an error when inversion ocurs :
File "GC_forecast_8bins_base_Mpc_DESI_dev.py", line 1345, in integ_LU_cross
function_A = aux_fun_LU_cross_vec(ecs, way, I1[0], I1[1], I1[2])
File "GC_forecast_8bins_base_Mpc_DESI_dev.py", line 1216, in aux_fun_LU_cross_vec
invCrossMatrix[r_p][s_p][:][:] = np.linalg.inv(arrayFullCross_vec[:][:][r_p][s_p])
File "/Users/fab/Library/Python/2.7/lib/python/site-packages/numpy/linalg/linalg.py", line 551, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/Users/fab/Library/Python/2.7/lib/python/site-packages/numpy/linalg/linalg.py", line 97, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
With another version of my code (with scalar values), everything works fine.
I expect to invert a 4x4 matrix at each iteration of loop.
Is the syntax nvCrossMatrix[:][:][r_p][s_p] = np.linalg.inv(arrayFullCross_vec[:][:][r_p][s_p]correct ?

Save complex numpy array as image using scikit-image

Get following error when try I to use io.imsave("image.jpg",array)
Traceback (most recent call last):
File "Fourer.py", line 37, in <module>
io.imsave( "test.jpg", fImage2)
File "C:\ProgramData\Miniconda3\lib\site-packages\skimage\io\_io.py", line 131, in imsave
if is_low_contrast(arr):
File "C:\ProgramData\Miniconda3\lib\site-packages\skimage\exposure\exposure.py", line 503, in is_low_contrast
dlimits = dtype_limits(image, clip_negative=False)
File "C:\ProgramData\Miniconda3\lib\site-packages\skimage\util\dtype.py", line 49, in dtype_limits
imin, imax = dtype_range[image.dtype.type]
KeyError: <class 'numpy.complex128'>
it's a 2n complex array I use
array = [[ 3.25000000e+02+0.00000000e+00j -1.25000000e+01+1.72047740e+01j
-1.25000000e+01+4.06149620e+00j -1.25000000e+01-4.06149620e+00j
-1.25000000e+01-1.72047740e+01j]
[-6.25000000e+01+8.60238700e+01j -8.88178420e-16+8.88178420e-16j
0.00000000e+00+1.29059879e-15j 0.00000000e+00+1.29059879e-15j
-8.88178420e-16-8.88178420e-16j]
[-6.25000000e+01+2.03074810e+01j -8.88178420e-16+4.44089210e-16j
-3.55271368e-15+5.46706420e-15j -3.55271368e-15+5.46706420e-15j
-8.88178420e-16-4.44089210e-16j]
[-6.25000000e+01-2.03074810e+01j -8.88178420e-16+4.44089210e-16j
-3.55271368e-15-5.46706420e-15j -3.55271368e-15-5.46706420e-15j
-8.88178420e-16-4.44089210e-16j]
[-6.25000000e+01-8.60238700e+01j -8.88178420e-16+8.88178420e-16j
0.00000000e+00-1.29059879e-15j 0.00000000e+00-1.29059879e-15j
-8.88178420e-16-8.88178420e-16j]]
How can i save i complex array as image?
If that matrix was obtained from the FFT of an image, then you first need to do Inverse FFT. Only then, you can save it using io.imsave.
If that is the case, take a look at skimage's:
----> Inverse Fourier Transform

sci-kit learn crashing on certain amounts of data

I'm trying to process a numpy array with 71,000 rows of 200 columns of floats and the two sci-kit learn models I'm trying both give different errors when I exceed 5853 rows. I tried removing the problematic row, but it continues to fail. Can sci-kit learn not handle this much data, or is it something else? The X is numpy array of a list of lists.
KNN:
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
Error:
File "knn.py", line 48, in <module>
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.py", line 642, in fit
return self._fit(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.py", line 180, in _fit
raise ValueError("data type not understood")
ValueError: data type not understood
K-Means:
kmeans_model = KMeans(n_clusters=2, random_state=1).fit(X)
Error:
Traceback (most recent call last):
File "knn.py", line 48, in <module>
kmeans_model = KMeans(n_clusters=2, random_state=1).fit(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 702, in fit
X = self._check_fit_data(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 668, in _check_fit_data
X = atleast2d_or_csr(X, dtype=np.float64)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 134, in atleast2d_or_csr
"tocsr", force_all_finite)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 111, in _atleast2d_or_sparse
force_all_finite=force_all_finite)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 91, in array2d
X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
Please check the dtype of your matrix X, e.g. by typing X.dtype. If it is object or dtype('O'), then write the lengths of the lines of X into an array:
lengths = [len(line) for line in X]
Then take a look to see whether all lines have the same length, by invoking
np.unique(lengths)
If there is more than one number in the output, then your line lengths are different, e.g. from line 5853 on, but possibly not all the time.
Numpy data arrays are only useful if all lines have the same length (they continue to work if not, but don't do what you expect.). You should check to see what is causing this, correct it, and then return to knn.
Here is an example of what happens if line lengths are not the same:
import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(100, 20)
# now remove one element from the 56th line
X = list(X)
X[55] = X[55][:-1]
# turn it back into an ndarray
X = np.array(X)
# check the dtype
print X.dtype # returns dtype('O')
from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors()
nbrs.fit(X) # raises your first error
from sklearn.cluster import KMeans
kmeans = KMeans()
kmeans.fit(X) # raises your second error

Create a sparse diagonal matrix from row of a sparse matrix

I process rather large matrices in Python/Scipy. I need to extract rows from large matrix (which is loaded to coo_matrix) and use them as diagonal elements. Currently I do that in the following fashion:
import numpy as np
from scipy import sparse
def computation(A):
for i in range(A.shape[0]):
diag_elems = np.array(A[i,:].todense())
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
#...
#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csc")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')
What I see from the profile output is that most of the time is consumed by get_csr_submatrix function while extracting diag_elems. That makes me think that I use either inefficient sparse representation of initial data or wrong way of extracting row from a sparse matrix. Can you suggest a better way to extract a row from a sparse matrix and represent it in a diagonal form?
EDIT
The following variant removes bottleneck from the row extraction (notice that simple changing 'csc' to csr is not sufficient, A[i,:] must be replaced with A.getrow(i) as well). However the main question is how to omit the materialization (.todense()) and create the diagonal matrix from the sparse representation of the row.
import numpy as np
from scipy import sparse
def computation(A):
for i in range(A.shape[0]):
diag_elems = np.array(A.getrow(i).todense())
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
#...
#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csr")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')
If I create DIAgonal matrix from 1-row CSR matrix directly, as follows:
diag_elems = A.getrow(i)
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1])
then I can neither specify format="csc" argument, nor convert ith_diags to CSC format:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/profile.py", line 70, in run
prof = prof.run(statement)
File "/usr/local/lib/python2.6/profile.py", line 456, in run
return self.runctx(cmd, dict, dict)
File "/usr/local/lib/python2.6/profile.py", line 462, in runctx
exec cmd in globals, locals
File "<string>", line 1, in <module>
File "<stdin>", line 4, in computation
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/construct.py", line 56, in spdiags
return dia_matrix((data, diags), shape=(m,n)).asformat(format)
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/base.py", line 211, in asformat
return getattr(self,'to' + format)()
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/dia.py", line 173, in tocsc
return self.tocoo().tocsc()
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/coo.py", line 263, in tocsc
data = np.empty(self.nnz, dtype=upcast(self.dtype))
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/sputils.py", line 47, in upcast
raise TypeError,'no supported conversion for types: %s' % args
TypeError: no supported conversion for types: object`
Here's what I came up with:
def computation(A):
for i in range(A.shape[0]):
idx_begin = A.indptr[i]
idx_end = A.indptr[i+1]
row_nnz = idx_end - idx_begin
diag_elems = A.data[idx_begin:idx_end]
diag_indices = A.indices[idx_begin:idx_end]
ith_diag = sparse.csc_matrix((diag_elems, (diag_indices, diag_indices)),shape=(A.shape[1], A.shape[1]))
ith_diag.eliminate_zeros()
Python profiler said 1.464 seconds versus 5.574 seconds before. It takes advantage of the underlying dense arrays (indptr, indices, data) that define sparse matrices. Here's my crash course: A.indptr[i]:A.indptr[i+1] defines which elements in the dense arrays correspond to the non-zero values in row i. A.data is a dense 1d array of non-zero the values of A and A.indptr is the column where those values go.
I would do some more testing to make very certain this does the same thing as before. I only checked a few cases.

Categories

Resources