Sparse Matrix error in MLPRegressor - python

Context
I'm running into an error when trying to use sparse matrices as an input to sklearn.neural_network.MLPRegressor. Nominally, this method is able to handle sparse matrices. I think this might be a bug in scikit-learn, but wanted to check on here before I submit an issue.
The Problem
When passing a scipy.sparse input to sklearn.neural_network.MLPRegressor I get:
ValueError: input must be a square array
The error is raised by the matrix_power function within numpy.matrixlab.defmatrix. It seems to occur because matrix_power passes the sparse matrix to numpy.asanyarray (L137), which returns an array of size=1, ndim=0 containing the sparse matrix object. matrix_power then performs some dimension checks (L138-141) to make sure the input is a square matrix, which fail because the array returned by numpy.asanyarray is not square, even though the underlying sparse matrix is square.
As far as I can tell, the problem stems from numpy.asanyarray preventing the dimensions of the sparse matrix being determined. The sparse matrix itself has a size attribute which would allow it to pass the dimension checks, but only if it's not run through asanyarray.
I think this might be a bug, but don't want to dive around filing issues until I've confirmed that I'm not just being an idiot! Please see below, to check.
If it is a bug, where would be the most appropriate place to raise an issue? NumPy? SciPy? or Scikit-Learn?
Minimal Example
Environment
Arch Linux
kernel 4.15.7-1
Python 3.6.4
numpy 1.14.1
scipy 1.0.0
sklearn 0.19.1
Code
import numpy as np
from scipy import sparse
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn.neural_network import MLPRegressor
## Generate some synthetic data
def fW(A, B, C):
return A * np.random.normal(.3, .1) + B * np.random.normal(.6, .1)
def fX(A, B, C):
return B * np.random.normal(-1, .1) + A * np.random.normal(-.9, .1) / C
# independent variables
N = int(1e4)
A = np.random.uniform(2, 12, N)
B = np.random.uniform(2, 12, N)
C = np.random.uniform(2, 12, N)
# synthetic data
mW = fW(A, B, C)
mX = fX(A, B, C)
# combine datasets
real = np.vstack([A, B, C]).T
meas = np.vstack([mW, mX]).T
# add noise to meas
meas *= np.random.normal(1, 0.0001, meas.shape)
## Make data sparse
prob_null = 0.2
real[np.random.choice([True, False], real.shape, p=[prob_null, 1-prob_null])] = np.nan
meas[np.random.choice([True, False], meas.shape, p=[prob_null, 1-prob_null])] = np.nan
# NB: problem persists whichever sparse matrix method is used.
real = sparse.csr_matrix(real)
meas = sparse.csr_matrix(meas)
# replace missing values with mean
rmnan = Imputer()
real = rmnan.fit_transform(real)
meas = rmnan.fit_transform(meas)
# split into test/training sets
real_train, real_test, meas_train, meas_test = model_selection.train_test_split(real, meas, test_size=0.3)
# create scalers and apply to data
real_scaler = StandardScaler(with_mean=False)
meas_scaler = StandardScaler(with_mean=False)
real_scaler.fit(real_train)
meas_scaler.fit(meas_train)
treal_train = real_scaler.transform(real_train)
tmeas_train = meas_scaler.transform(meas_train)
treal_test = real_scaler.transform(real_test)
tmeas_test = meas_scaler.transform(meas_test)
nn = MLPRegressor((100,100,10), solver='lbfgs', early_stopping=True, activation='tanh')
nn.fit(tmeas_train, treal_train)
## ERROR RAISED HERE
## The problem:
# the sparse matrix has a shape attribute that would pass the square matrix validation
tmeas_train.shape
# but not after it's been through asanyarray
np.asanyarray(tmeas_train).shape

MLPRegressor.fit() as given in documentation supports sparse matrix for X but not for y
Parameters:
X : array-like or sparse matrix, shape (n_samples, n_features)
The input data.
y : array-like, shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels in classification, real numbers in regression).
I am able to successfully run your code with:
nn.fit(tmeas_train, treal_train.toarray())

Related

PCA implementation on 3D numpy array

I have a feature set of size 2240*5*16. 2240 are number of samples, 5 represents number of channels and 16 shows # of statistical features extracted such as mean, variance, etc.
Now, I want to apply PCA. However, PCA is applicable on 2D array. I applied the following code:
from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca.fit(features)
I get the following error.
ValueError: Found array with dim 3. Estimator expected <= 2.
It doesn't support axis argument. As it is only applicable on 2D, how can I utilize it on my case (3D)? Any suggestion, if I want to reduce the dimensions from 2240*5*16 to 2240*5*5, please?
I would just loop over each channel and do PCA separately.
import numpy as np
from sklearn.decomposition import PCA
X = np.random.rand(1000, 5, 10)
X_transform = np.zeros((X.shape[0], 5, 5))
for i in range(X.shape[1]):
pca = PCA(n_components=5)
f = pca.fit_transform(X[:, i, :])
X_transform[:, i, :] = f
print((X_transform.shape))

Cannot reshape numpy array to vector

I am trying to reshape an (N, 1) array d to an (N,) vector. According to this solution and my own experience with numpy, the following code should convert it to a vector:
from sklearn.neighbors import kneighbors_graph
from sklearn.datasets import make_circles
X, labels = make_circles(n_samples=150, noise=0.1, factor=0.2)
A = kneighbors_graph(X, n_neighbors=5)
d = np.sum(A, axis=1)
d = d.reshape(-1)
However, d.shape gives (1, 150)
The same happens when I exactly replicate the code for the linked solution. Why is the numpy array not reshaping?
The issue is that the sklearn functions returned the nearest neighbor graph as a sparse.csr.csr_matrix. Applying np.sum returned a numpy.matrix, a data type that (in my opinion) should no longer exist. numpy.matrixs are incompatible with just about everything, and numpy operations on them return unexpected results.
The solution was casting the numpy.csr.csr_matrix to a numpy.array:
A = kneighbors_graph(X, n_neighbors=5)
A = A.toarray()
d = np.sum(A, axis=1)
d = d.reshape(-1)
Now we have d.shape = (150,)

Issues using the scipy.sparse.linalg linear system solvers

I've got linear system to solve which consists of large, sparse matrices.
I've been using the scipy.sparse library, and its linalg sub-library to do this, but I can't get some of the linear solvers to work.
Here is a working example which reproduces the issue for me:
from numpy.random import random
from scipy.sparse import csc_matrix
from scipy.sparse.linalg import spsolve, minres
N = 10
A = csc_matrix( random(size = (N,N)) )
A = (A.T).dot(A) # force the matrix to be symmetric, as required by minres
x = csc_matrix( random(size = (N,1)) ) # create a solution vector
b = A.dot(x) # create the RHS vector
# verify shapes and types are correct
print('A', A.shape, type(A))
print('x', x.shape, type(x))
print('b', b.shape, type(b))
# spsolve function works fine
sol1 = spsolve(A, b)
# other solvers throw a incompatible dimensions ValueError
sol2 = minres(A, b)
Running this produces the following error
raise ValueError('A and b have incompatible dimensions')
ValueError: A and b have incompatible dimensions
for the call to minres, even though the dimensions clearly are compatible. Other solvers in scipy.sparse.linalg, such as cg, lsqr and gmres all throw an identical error.
This is being run on python 3.6.1 with SciPy 0.19.
Anyone have any idea what's going on here?
Thanks!
Your usage is incompatible with the API!
spsolve on b:
b : ndarray or sparse matrix
The matrix or vector representing the right hand side of the equation. If a vector, b.shape must be (n,) or (n, 1).
sparse b is allowed
minres on b:
b : {array, matrix}
Right hand side of the linear system. Has shape (N,) or (N,1).
sparse b is not allowed here!
The same applies to the mentioned non-working solvers (where lsqr might be a bit different -> array_like vs. array).
This is not that uncommon as sparse rhs-vectors are not helping in many cases and a lot of numerical-optimization devs therefore drop support!
This works:
sol2 = minres(A, b.todense())
(you got my upvote and praise for the nice reproducible example!)

Linear Discriminant Analysis inverse transform

I try to use Linear Discriminant Analysis from scikit-learn library, in order to perform dimensionality reduction on my data which has more than 200 features. But I could not find the inverse_transform function in the LDA class.
I just wanted to ask, how can I reconstruct the original data from a point in LDA domain?
Edit base on #bogatron and #kazemakase answer:
I think the term "original data" was wrong and instead I should use "original coordinate" or "original space". I know without all PCAs we can't reconstruct the original data, but when we build the shape space we project the data down to lower dimension with help of PCA. The PCA try to explain the data with only 2 or 3 components which could capture the most of the variance of the data and if we reconstruct the data base on them it should show us the parts of the shape that causes this separation.
I checked the source code of the scikit-learn LDA again and I noticed that the eigenvectors are store in scalings_ variable. when we use the svd solver, it's not possible to inverse the eigenvectors (scalings_) matrix, but when I tried the pseudo-inverse of the matrix, I could reconstruct the shape.
Here, there are two images which are reconstructed from [ 4.28, 0.52] and [0, 0] points respectively:
I think that would be great if someone explain the mathematical limitation of the LDA inverse transform in depth.
The inverse of the LDA does not necessarily make sense beause it loses a lot of information.
For comparison, consider the PCA. Here we get a coefficient matrix that is used to transform the data. We can do dimensionality reduction by stripping rows from the matrix. To get the inverse transform, we first invert the full matrix and then remove the columns corresponding to the removed rows.
The LDA does not give us a full matrix. We only get a reduced matrix that cannot be directly inverted. It is possible to take the pseudo inverse, but this is much less efficient than if we had the full matrix at our disposal.
Consider a simple example:
C = np.ones((3, 3)) + np.eye(3) # full transform matrix
U = C[:2, :] # dimensionality reduction matrix
V1 = np.linalg.inv(C)[:, :2] # PCA-style reconstruction matrix
print(V1)
#array([[ 0.75, -0.25],
# [-0.25, 0.75],
# [-0.25, -0.25]])
V2 = np.linalg.pinv(U) # LDA-style reconstruction matrix
print(V2)
#array([[ 0.63636364, -0.36363636],
# [-0.36363636, 0.63636364],
# [ 0.09090909, 0.09090909]])
If we have the full matrix we get a different inverse transform (V1) than if we simple invert the transform (V2). That is because in the second case we lost all information about the discarded components.
You have been warned. If you still want to do the inverse LDA transform, here is a function:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.utils.validation import check_is_fitted
from sklearn.utils import check_array, check_X_y
import numpy as np
def inverse_transform(lda, x):
if lda.solver == 'lsqr':
raise NotImplementedError("(inverse) transform not implemented for 'lsqr' "
"solver (use 'svd' or 'eigen').")
check_is_fitted(lda, ['xbar_', 'scalings_'], all_or_any=any)
inv = np.linalg.pinv(lda.scalings_)
x = check_array(x)
if lda.solver == 'svd':
x_back = np.dot(x, inv) + lda.xbar_
elif lda.solver == 'eigen':
x_back = np.dot(x, inv)
return x_back
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
lda = LinearDiscriminantAnalysis()
Z = lda.fit(X, y).transform(X)
Xr = inverse_transform(lda, Z)
# plot first two dimensions of original and reconstructed data
plt.plot(X[:, 0], X[:, 1], '.', label='original')
plt.plot(Xr[:, 0], Xr[:, 1], '.', label='reconstructed')
plt.legend()
You see, the result of the inverse transform does not have much to do with the original data (well, it's possible to guess the direction of the projection). A considerable part of the variation is gone for good.
There is no inverse transform because in general, you can not return from the lower dimensional feature space to your original coordinate space.
Think of it like looking at your 2-dimensional shadow projected on a wall. You can't get back to your 3-dimensional geometry from a single shadow because information is lost during the projection.
To address your comment regarding PCA, consider a data set of 10 random 3-dimensional vectors:
In [1]: import numpy as np
In [2]: from sklearn.decomposition import PCA
In [3]: X = np.random.rand(30).reshape(10, 3)
Now, what happens if we apply the Principal Components Transformation (PCT) and apply dimensionality reduction by keeping only the top 2 (out of 3) PCs, then apply the inverse transform?
In [4]: pca = PCA(n_components=2)
In [5]: pca.fit(X)
Out[5]:
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
In [6]: Y = pca.transform(X)
In [7]: X.shape
Out[7]: (10, 3)
In [8]: Y.shape
Out[8]: (10, 2)
In [9]: XX = pca.inverse_transform(Y)
In [10]: X[0]
Out[10]: array([ 0.95780971, 0.23739785, 0.06678655])
In [11]: XX[0]
Out[11]: array([ 0.87931369, 0.34958407, -0.01145125])
Obviously, the inverse transform did not reconstruct the original data. The reason is that by dropping the lowest PC, we lost information. Next, let's see what happens if we retain all PCs (i.e., we do not apply any dimensionality reduction):
In [12]: pca2 = PCA(n_components=3)
In [13]: pca2.fit(X)
Out[13]:
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
In [14]: Y = pca2.transform(X)
In [15]: XX = pca2.inverse_transform(Y)
In [16]: X[0]
Out[16]: array([ 0.95780971, 0.23739785, 0.06678655])
In [17]: XX[0]
Out[17]: array([ 0.95780971, 0.23739785, 0.06678655])
In this case, we were able to reconstruct the original data because we didn't throw away any information (since we retained all the PCs).
The situation with LDA is even worse because the maximum number of components that can be retained is not 200 (the number of features for your input data); rather, the maximum number of components you can retain is n_classes - 1. So if, for example, you were doing a binary classification problem (2 classes), the LDA transform would be going from 200 input dimensions down to just a single dimension.

PYMC / Theano memory usage with sparse matrices

I am using the latest version of PYMC3 / Theano to compute the MAP estimate for the following simple logistic regression model:
w ~ N(0, 1)
y ~ Bin(σ(Xw))
My features are strictly categorical, and I am using one-hot encoding to transform the design matrix into a sparse (CSR) matrix of 0's and 1's. The sparse matrix occupies roughly 15MB, but I am seeing memory spikes up to 4GB. Incidentally, this is the size of a dense matrix with the same dimensions.
Below is a simplified version of my code,
from __future__ import division, print_function
import numpy as np
import pymc3 as pm
from scipy import sparse as sp
from scipy.optimize import fmin_l_bfgs_b
from scipy.special import expit
from theano import sparse as S
np.random.seed(0)
# properties of sparse design matrix (taken from the real data)
N = 100000 # number of samples
M = 5000 # number of dimensions
D = 0.002 # matrix density
# fake data
mu0, sd0 = 0.0, 1.0
w = np.random.normal(mu0, sd0, M)
X = sp.random(N, M, density=D, format='csr', data_rvs=np.ones)
y = np.random.binomial(1, expit(X.dot(w)), N)
# estimate memory usage
size = X.data.nbytes + X.indices.nbytes + X.indptr.nbytes + y.nbytes
print('{:.2f} MB of data'.format(size / 1024 ** 2))
# model definition
with pm.Model() as model:
w = pm.Normal('w', mu0, sd=sd0, shape=M)
p = pm.sigmoid(S.dot(X, w.reshape((-1, 1)))).flatten()
pm.Bernoulli('y', p, observed=y)
print(pm.find_MAP(vars=[w], fmin=fmin_l_bfgs_b))
which produces the following memory profile:
Even if the L-BFGS optimizer computed the full dense Hessian (which it doesn't), that would take up only 190MB. I suspect that somewhere along the way, X is automatically converted into a Numpy array to implement some Theano op.
Has anyone successfully worked with sparse matrices in PYMC3? Any ideas where exactly the problem could lie?

Categories

Resources