How to generate a Random Sparse Hermitian Matrix in python? - python

I would like to generate a Random Sparse Hermitian Matrix of a given shape in python. How can I do it efficiently? Is there any built-in python function for this task?
I have found a solution for the random sparse matrix, but I want the matrix to be Hermitian too. Here is the solution for the random sparse matrix that I found
import numpy as np
import scipy.stats as stats
import scipy.sparse as sparse
import matplotlib.pyplot as plt
np.random.seed((3,14159))
def sprandsym(n, density):
rvs = stats.norm().rvs
X = sparse.random(n, n, density=density, data_rvs=rvs)
upper_X = sparse.triu(X)
result = upper_X + upper_X.T - sparse.diags(X.diagonal())
return result
M = sprandsym(5000, 0.01)
print(repr(M))
# <5000x5000 sparse matrix of type '<class 'numpy.float64'>'
# with 249909 stored elements in Compressed Sparse Row format>
# check that the matrix is symmetric. The difference should have no non-zero elements
assert (M - M.T).nnz == 0
statistic, pval = stats.kstest(M.data, 'norm')
# The null hypothesis is that M.data was drawn from a normal distribution.
# A small p-value (say, below 0.05) would indicate reason to reject the null hypothesis.
# Since `pval` below is > 0.05, kstest gives no reason to reject the hypothesis
# that M.data is normally distributed.
print(statistic, pval)
# 0.0015998040114 0.544538788914
fig, ax = plt.subplots(nrows=2)
ax[0].hist(M.data, normed=True, bins=50)
stats.probplot(M.data, dist='norm', plot=ax[1])
plt.show()

We know that a matrix plus it's hermitian is a hermitian matrix. So to ensure your final matrix B is hermitian, just do
B = A + A.conj().T

Related

Python scipy.sparse: how to efficiently set a set of entries to 0?

Let a be a big scipy.sparse matrix and IJ={(i0,j0),(i1,j1),...} a set of positions. How can I efficiently set all the entries in a in positions IJ to 0? Something like a[IJ]=0.
In Mathematica, I would create a new sparse matrix b with background value 1 (instead of 0) and all entries in IJ. Then, I would use a=a*b (entry-wise multiplication). That does not seem to be an option here.
A toy example:
import scipy.sparse as sp
import numpy as np
np.set_printoptions(linewidth=200,edgeitems=5,precision=4)
m=n=10**1;
a=sp.random(m,n,4/m,format='csr'); print(a.toarray())
IJ=np.array([range(0,n,2),range(0,n,2)]); print(IJ) #every second diagonal
You are almost there. To go by your definitions, all you'd need to do is:
a[IJ[0],IJ[1]] = 0
Note that scipy will warn you:
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
You can read more about that here.
The scipy sparse matrices can't have a non-zero background value. While it it possible to make a "sparse" matrix with lots of non-zero value, the performance (speed & memory) would be far worse than dense matrix multiplication.
A possible work-around is to rewrite every sparse matrix to have a default value of zero. For example, if matrix Y' contains mostly 1, I can replace Y' by I - Y where Y = I - Y' and I is the identity matrix.
import scipy.sparse as sp
import numpy as np
size = (100, 100)
x = np.random.uniform(-1, 1, size=size)
y = sp.random(*size, 0.001, format='csr')
# Z = (I - Y)X = X - YX
z = x - y.multiply(x)
# A = X(I - Y) = X - XY = X - transpose(YX)
a = x - y.multiply(x).T

Sparse Matrix error in MLPRegressor

Context
I'm running into an error when trying to use sparse matrices as an input to sklearn.neural_network.MLPRegressor. Nominally, this method is able to handle sparse matrices. I think this might be a bug in scikit-learn, but wanted to check on here before I submit an issue.
The Problem
When passing a scipy.sparse input to sklearn.neural_network.MLPRegressor I get:
ValueError: input must be a square array
The error is raised by the matrix_power function within numpy.matrixlab.defmatrix. It seems to occur because matrix_power passes the sparse matrix to numpy.asanyarray (L137), which returns an array of size=1, ndim=0 containing the sparse matrix object. matrix_power then performs some dimension checks (L138-141) to make sure the input is a square matrix, which fail because the array returned by numpy.asanyarray is not square, even though the underlying sparse matrix is square.
As far as I can tell, the problem stems from numpy.asanyarray preventing the dimensions of the sparse matrix being determined. The sparse matrix itself has a size attribute which would allow it to pass the dimension checks, but only if it's not run through asanyarray.
I think this might be a bug, but don't want to dive around filing issues until I've confirmed that I'm not just being an idiot! Please see below, to check.
If it is a bug, where would be the most appropriate place to raise an issue? NumPy? SciPy? or Scikit-Learn?
Minimal Example
Environment
Arch Linux
kernel 4.15.7-1
Python 3.6.4
numpy 1.14.1
scipy 1.0.0
sklearn 0.19.1
Code
import numpy as np
from scipy import sparse
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn.neural_network import MLPRegressor
## Generate some synthetic data
def fW(A, B, C):
return A * np.random.normal(.3, .1) + B * np.random.normal(.6, .1)
def fX(A, B, C):
return B * np.random.normal(-1, .1) + A * np.random.normal(-.9, .1) / C
# independent variables
N = int(1e4)
A = np.random.uniform(2, 12, N)
B = np.random.uniform(2, 12, N)
C = np.random.uniform(2, 12, N)
# synthetic data
mW = fW(A, B, C)
mX = fX(A, B, C)
# combine datasets
real = np.vstack([A, B, C]).T
meas = np.vstack([mW, mX]).T
# add noise to meas
meas *= np.random.normal(1, 0.0001, meas.shape)
## Make data sparse
prob_null = 0.2
real[np.random.choice([True, False], real.shape, p=[prob_null, 1-prob_null])] = np.nan
meas[np.random.choice([True, False], meas.shape, p=[prob_null, 1-prob_null])] = np.nan
# NB: problem persists whichever sparse matrix method is used.
real = sparse.csr_matrix(real)
meas = sparse.csr_matrix(meas)
# replace missing values with mean
rmnan = Imputer()
real = rmnan.fit_transform(real)
meas = rmnan.fit_transform(meas)
# split into test/training sets
real_train, real_test, meas_train, meas_test = model_selection.train_test_split(real, meas, test_size=0.3)
# create scalers and apply to data
real_scaler = StandardScaler(with_mean=False)
meas_scaler = StandardScaler(with_mean=False)
real_scaler.fit(real_train)
meas_scaler.fit(meas_train)
treal_train = real_scaler.transform(real_train)
tmeas_train = meas_scaler.transform(meas_train)
treal_test = real_scaler.transform(real_test)
tmeas_test = meas_scaler.transform(meas_test)
nn = MLPRegressor((100,100,10), solver='lbfgs', early_stopping=True, activation='tanh')
nn.fit(tmeas_train, treal_train)
## ERROR RAISED HERE
## The problem:
# the sparse matrix has a shape attribute that would pass the square matrix validation
tmeas_train.shape
# but not after it's been through asanyarray
np.asanyarray(tmeas_train).shape
MLPRegressor.fit() as given in documentation supports sparse matrix for X but not for y
Parameters:
X : array-like or sparse matrix, shape (n_samples, n_features)
The input data.
y : array-like, shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels in classification, real numbers in regression).
I am able to successfully run your code with:
nn.fit(tmeas_train, treal_train.toarray())

Converting from Numpy.zeros(100,100) to using a Scipy.sparse.lil_matrix(100,100) Error

I am creating a finite volume solver, and had success using numpy.zero to create a zero matrix and used a for loop to fill specific locations of the matrix with values that I wish to calculate.
However, I need to use a larger matrix specifically numpy.zeros(102400,102400) but I get the error "Array too Big" I can do a numpy.zeros(10000,10000) matrix but that seems like the limit of my system (6 GB Ram).
I was told changing the matrix into a sparse matrix would free space for my code, and all me to do the calculations. However my code that initially was created to fill a zero matrix can not be used on this sparse matrix, and I don't know why.
import numpy as np
import scipy as sp
from scipy import sparse
matA = sp.sparse.lil_matrix(m, m)
matb = sp.sparse.lil_matrix(m, 1)
i = 0
for row in range(Lrow):
for column in range(Lcol):
if row == 0 and column == 0:
matA[i, i + 1] = -k * (delY / delX)
matA[i, i + Lcol] = -k * (delX / delY)
matA[i, i] = -(3 * matA[i, i + 1] + matA[i, i + Lcol])
edit: my m = 100000 and i get iterated each time at the end of the if statement by i = i + 1
You are initializing your sparse matrices incorrectly. Take a look a the documentation for lil_matrix. Given that m is your shape parameter you actually want to initialize the matrix as follows (note that the first argument is a tuple):
matA = scipy.sparse.lil_matrix((m, m))
matA
<100000x100000 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in LInked List format>
The way you are doing it you end up with a 1x1 matrix, which I assume is not your intent:
matA = scipy.sparse.lil_matrix(m, m)
matA
<1x1 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in LInked List format>
The reason for this is that the first argument for lil_matrix is looking for an array, or array-like, another sparse matrix, or a tuple. When you enter lil_matrix(m,m) it essentially ignores the second argument, as the first is interpreted as an array-like, and just initializes a 1x1 array with the value set to m.

Calculate Similarity of Sparse Matrix

I am using Python with numpy, scipy and scikit-learn module.
I'd like to classify the arrays in very big sparse matrix. (100,000 * 100,000)
The values in the matrix are equal to 0 or 1. The only thing I have is the index of value = 1.
a = [1,3,5,7,9]
b = [2,4,6,8,10]
which means
a = [0,1,0,1,0,1,0,1,0,1,0]
b = [0,0,1,0,1,0,1,0,1,0,1]
How can I change the index array to the sparse array in scipy ?
How can I classify those array quickly ?
Thank you very much.
If you choose the sparse coo_matrix you can create it passing the indices like:
from scipy.sparse import coo_matrix
import scipy
nrows = 100000
ncols = 100000
row = scipy.array([1,3,5,7,9])
col = scipy.array([2,4,6,8,10])
values = scipy.ones(col.size)
m = coo_matrix((values, (row,col)), shape=(nrows, ncols), dtype=float)

Out of memory when using numpy's multivariate_normal random sampliing

I tried to use numpy.random.multivariate_normal to do random samplings on some 30000+ variables, while it always took all of my memory (32G) and then terminated. Actually, the correlation is spherical and every variable is correlated to about only 2500 other variables. Is there another way to specify the spherical covariance matrix, rather than the full covariance matrix, or any other way to reduce the usage of the memory?
My code is like this:
cm = [] #covariance matrix
for i in range(width*height):
cm.append([])
for j in range(width*height):
cm[i].append(corr_calc()) #corr is inversely proportional to the distance
mean = [vth]*(width*height)
cache_vth=numpy.random.multivariate_normal(mean,cm)
If your correlation is spherical, that is the same as saying that the value along each dimension is uncorrelated to the other dimensions, and that the variance along every dimension is the same. You don't need to build the covariance matrix at all, drawing one sample from your 30,000-D multivariate normal is the same as drawing 30,000 samples from a 1-D normal. That is, instead of doing:
n = 30000
mu= 0
corr = 1
cm = np.eye(n) * corr
mean = np.ones((n,)) * mu
np.random.multivariate_normal(mean, cm)
Which fails when trying to build the cm array, try the following:
n = 30000
mu = 0
corr = 1
>>> np.random.normal(mu, corr, size=n)
array([ 0.88433649, -0.55460098, -0.74259886, ..., 0.66459841,
0.71225572, 1.04012445])
If you want more than one random sample, say 3, try
>>> np.random.normal(mu, corr, size=(3, n))
array([[-0.97458499, 0.05072532, -0.0759601 , ..., -0.31849315,
-2.17552787, -0.36884723],
[ 1.5116701 , 2.53383547, 1.99921923, ..., -1.2769304 ,
0.36912488, 0.3024549 ],
[-1.12615267, 0.78125589, 0.67133243, ..., -0.45441239,
-1.21083007, 1.45696714]])

Categories

Resources