Here is the description for louvain in scanpy.
I would like to pass a specific adj matrix, however, I tried the minimal example as follows and got the result of "Length of values (4) does not match length of index (6)". Is this mistake due to the misuse of the sparse matrix?
Code:
import scanpy as sc
import torch
import numpy as np
import networkx as nx
nodes = [[0, 0, 0, 1], [0, 0, 0, 2], [0, 10, 0, 0], [0, 11, 0, 0], [1, 0, 0, 0], [2, 0, 0, 0]]
features = torch.tensor(nodes)
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
sc.pp.neighbors(adata, n_neighbors=2, use_rep='X')
sc.tl.louvain(adata, resolution=0.01, adjacency=G_adj) # pass the adj here
y_pred = adata.obs['louvain'].astype(int).to_numpy()
n_clusters = len(np.unique(y_pred))
Could you point out what is wrong and provide an example of how to explicitly pass an adjacency matrix when using scanpy.tl.louvain? Thanks!
G is a graph created with four nodes, and thus G_adj is a (4, 4) sparse matrix.
adata is a scanpy object with 6 observations, and four variables. the scanpy louvain algorithm clusters observations, and thus expects an adjacncy matrix of shape (6, 6).
Not sure what you were meaning to do:
If you truly have 6 nodes you should alter your code for the graph:
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph()
G.add_nodes_from(range(6))
G.add_edges_from(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
If you have 4 nodes, alter the adata creation line:
adata = sc.AnnData(features.numpy().T)
Related
I would like to know, for this mixture of Gaussian distributions generated by the data we give ourselves, how do we figure out which component is more likely to belong to a new sample we are given?
I learned that Matlab seems to have functions that can be calculated directly, is there any in python? I haven't found an answer so far.
import matplotlib.pyplot as plt
import numpy as np
import random
# Bivariate example
dim = 2
# Settings
n = 500
NumberOfMixtures = 3
# Mixture weights (non-negative, sum to 1)
w = [0.5, 0.25, 0.25]
# Mean vectors and covariance matrices
MeanVectors = [ [0,0], [-5,5], [5,5] ]
CovarianceMatrices = [ [[1, 0], [0, 1]], [[1, .8], [.8, 1]], [[1, -.8], [-.8, 1]] ]
# Initialize arrays
samples = np.empty( (n,dim) ); samples[:] = np.NaN
componentlist = np.empty( (n,1) ); componentlist[:] = np.NaN
# Generate samples
for iter in range(n):
# Get random number to select the mixture component with probability according to mixture weights
DrawComponent = random.choices(range(NumberOfMixtures), weights=w, cum_weights=None, k=1)[0]
# Draw sample from selected mixture component
DrawSample = np.random.multivariate_normal(MeanVectors[DrawComponent], CovarianceMatrices[DrawComponent], 1)
# Store results
componentlist[iter] = DrawComponent
samples[iter, :] = DrawSample
# Report fractions
print('Fraction of mixture component 0:', np.sum(componentlist==0)/n)
print('Fraction of mixture component 1:',np.sum(componentlist==1)/n)
print('Fraction of mixture component 2:',np.sum(componentlist==2)/n)
# Visualize result
plt.plot(samples[:, 0], samples[:, 1], '.', alpha=0.5)
plt.grid()
plt.show()
The problem has been sovled, the answer can refer in the link:
https://stackoverflow.com/questions/42971126/multivariate-gaussian-distribution-scipy
I am trying to generate a diagonal matrix using a linear regression coefficient. First I generated an empty matrix. Then I extract the coefficient from the regression model. Here's my code:
P = np.zeros((ncol, ncol), dtype = int)
intercep = np.zeros((1, ncol), dtype = int)
my_pls = PLSRegression(n_components = ncomp, scale=False)
model = my_pls.fit(x, y)
#extract pls coeffeicient:
coef = model.coef_
intercep = model.y_mean_ - (model.x_mean_.dot(coef))
P[(i-k):(i+k), i-k] = np.diag(coef[0:ncol])
But I got zero matrices after running the code. Can anyone please help me out with how to get the diagonal matrix from the regression coefficient?
Not sure why you need to declare P.
You can get diagonal matrix with zeros directly from the 1D list/vector using numpy.diag
x=[3,5,6,7]
numpy.diag(x)
Output:
array([[3, 0, 0, 0],
[0, 5, 0, 0],
[0, 0, 6, 0],
[0, 0, 0, 7]])
For your case, try P=np.diag(coef)
I'm trying to compute the cosine similarity between 350k sentences using tensorflow.
My sentences are first vectorisd using sklearn:
doc = df['text']
vec = TfidfVectorizer(binary=False,norm='l2',use_idf=False,smooth_idf=False,lowercase=True,stop_words='english',min_df=1,max_df=1.0,max_features=None,ngram_range=(1, 1))
X = vec.fit_transform(doc)
print(X.shape)
print(type(X))
This works very well and I get sparse matrix back, I have then tried in two ways to convert my sparse matrix to a dense one.
(1) I tried this:
dense = X.toarray()
This only works with a small amount of data (around 10k sentences), but then fails on the actual computation.
(2) I have been trying to convert the output X this way, but get the same error message when doing the first step K:
K = tf.convert_to_tensor(X, dtype=None, dtype_hint=None, name=None)
Y = tf.sparse.to_dense(K, default_value=None, validate_indices=True, name=None)
Any tips/ tricks to solve this mystery would be greatly appreciated. Also happy to consider batching my computations if that should be more efficient in terms of size?
You need to make a TensorFlow sparse matrix from your SciPy one. Since your matrix seems to be in CSR format, you can do it as follows:
import numpy as np
import scipy.sparse
import tensorflow as tf
def sparse_csr_to_tf(csr_mat):
indptr = tf.constant(csr_mat.indptr, dtype=tf.int64)
elems_per_row = indptr[1:] - indptr[:-1]
i = tf.repeat(tf.range(csr_mat.shape[0], dtype=tf.int64), elems_per_row)
j = tf.constant(csr_mat.indices, dtype=tf.int64)
indices = np.stack([i, j], axis=-1)
data = tf.constant(csr_mat.data)
return tf.sparse.SparseTensor(indices, data, csr_mat.shape)
# Test
m = scipy.sparse.csr_matrix([
[0, 0, 1, 0],
[0, 0, 0, 0],
[2, 0, 3, 4],
], dtype=np.float32)
tf_mat = sparse_csr_to_tf(m)
tf.print(tf.sparse.to_dense(tf_mat))
# [[0 0 1 0]
# [0 0 0 0]
# [2 0 3 4]]
What is the easiest and fastest way to interpolate between two arrays to get new array.
For example, I have 3 arrays:
x = np.array([0,1,2,3,4,5])
y = np.array([5,4,3,2,1,0])
z = np.array([0,5])
x,y corresponds to data-points and z is an argument. So at z=0 x array is valid, and at z=5 y array valid. But I need to get new array for z=1. So it could be easily solved by:
a = (y-x)/(z[1]-z[0])*1+x
Problem is that data is not linearly dependent and there are more than 2 arrays with data. Maybe it is possible to use somehow spline interpolation?
This is a univariate to multivariate regression problem. Scipy supports univariate to univariate regression, and multivariate to univariate regression. But you can instead iterate over the outputs, so this is not such a big problem. Below is an example of how it can be done. I've changed the variable names a bit and added a new point:
import numpy as np
from scipy.interpolate import interp1d
X = np.array([0, 5, 10])
Y = np.array([[0, 1, 2, 3, 4, 5],
[5, 4, 3, 2, 1, 0],
[8, 6, 5, 1, -4, -5]])
XX = np.array([0, 1, 5]) # Find YY for these
YY = np.zeros((len(XX), Y.shape[1]))
for i in range(Y.shape[1]):
f = interp1d(X, Y[:, i])
for j in range(len(XX)):
YY[j, i] = f(XX[j])
So YY are the result for XX. Hope it helps.
I'm working on a very simple example of random walk simulations using numpy. My professor insists that we use numpy's broadcast functionality instead of for loops as much as we can, and I want to know if it's possible to broadcast dictionary definitions.
e.g. I have the array [E W N S]. Running through that array using the dictionary would result in [[1, 0] [-1, 0] [0, 1] [0, -1]].
import numpy as np
import matplotlib.pyplot as plt
def random_path(origin, nsteps, choices, choice_probs, choice_map):
directions = np.random.choice(choices, size=(15,), p=choice_probs)
print directions
def main():
directions = ['N', 'S', 'E', 'W']
dir_probabilities = [.2, .3, .45, .05]
dir_map = {'N': [0, 1], 'S': [0, -1], 'E': [1, 0], 'W': [-1, 0]}
origin = [0, 0]
np.random.seed(12345)
path = random_path(origin, 15, directions, dir_probabilities, dir_map)
main()
Why not just ignore the actual directional labels and just store the directions as a (4,2) shaped numpy array? Then you would just index into that array directly.
def random_path(origin, nsteps, choices, choice_probs, choice_map):
directions = np.random.choice(choices, size=(15,), p=choice_probs)
return directions
dir_map = np.array([[0,1], [0,-1], [1,0], [-1,0]])
# Everything else is the same as defined by OP
path_directions = random_path(origin, 15, np.arange(4), dir_probabilities, dir_map)
path = dir_map[path_directions]
Now path is a (15,2) shaped numpy array containing the sequence of moves from the dir_map.