I have run a classification experiment with 2 classifiers on a dataset with 2 classes and 150 samples. Classifiers are scikit-learn objects with predict_proba() method. This method returns an array of shape (samples, classes) with the probability distribution for each sample. I also computed another matrix G with shape (samples, 2) which contains the "importance" of each classifier for each sample.
The final output must be a linear combination of each predict_proba() row and the scalar in G. Example with one single sample:
G = np.array([0.3, 0.7])
classifier_1_proba = np.array([0.6, 0.4])
classifier_2_proba = np.array([0.2, 0.8])
Y = classifier_1_proba * G[0] + classifier_2_proba * G[1]
This is easy with just one sample/output, but i don't know how could it be done with multiple samples (e.g. an entire test set).
I think this would work for you:
Y = c1_proba * G[:, 0, None] + c2_proba * G[:, 1, None]
Assuming the classifier proba matrices c1_proba, c2_proba and the weights G are all 2D numpy arrays as you mentioned.
Related
Let w, x, y, z be torch tensors of shape (m, n) and we wish to compute the following unbiased estimator row-wise efficiently (without for loops), where I want to compute for every row 1, ..., m:
In case of only the unbiased estimator of the square of means, i.e., for :
this is possible, e.g., using torch.einsum:
batch_outer = torch.einsum('bi, bj -> bij', x, y)
zero_diag = 1-torch.eye(batch_outer.shape[1])
return (batch_outer * zero_diag).sum(dim=2).sum(dim=1) / (n * (n-1))
However, for the case to the power of four this is not so easy doable, mostly because these are not squared tensors and in particular, because the zeroing out of the diagonals becomes very tedious.
My questions:
1.) How can this be implemented efficiently ommitting any for loops?
2.) Which time and memory complexity would that solution have in big O notation?
3.) Can this solution also be used to do it with four 3D tensors of shape (m, k, n), where again we only want to do the computations along the axes of length n (dim=2)?
4.) If I want to do it in log-space for numerical stability, i.e., to use logsumexp for summations and sums for multiplications (because log(xy)= log(x)+log(y)), any solution with einsum wouldnt work anymore. How could that computation then be done in log space?
1 This implementation seems to work if I didn't make mess with the diagonal dimensions.
import numpy as np
import torch as th
x = np.array([1,4,5,3])
y = np.array([5,2,4,5])[np.newaxis]
z = np.array([5,7,4,5])[np.newaxis][np.newaxis]
w = np.array([3,9,5,1])[np.newaxis][np.newaxis][np.newaxis]
xth = th.Tensor(x)
yth = th.Tensor(y)
zth = th.Tensor(z)
wth = th.Tensor(w)
tensor = xth*th.transpose(yth, 0, 1)*th.transpose(zth,0,2)*th.transpose(wth,0,3)
diag = th.diagonal(tensor, dim1 = -2, dim2 = -1)
result = th.sum(tensor) - th.sum(diag)
result /= np.math.factorial(len(x))
print(result)
The order is between O(n^2.37..) - O(n^3), depending on the pytorch implementation of the matrix multiplication.
I don't see why not, just choose properly the dimensions to transpose and take the diagonal.
I don't see why would this solution won't work in a log-space.
pd: my knowledge in pytorch is quite limited, but I'm sure you can define x,y,z,w in a more elegant way.
I know this may be a question of semantics but I always see different articles explain forward pass slightly different. e.g. Sometimes they represent a forward pass to a hidden layer in a standard neural network as np.dot(x, W) and sometimes I see it as np.dot(W.T, x) and sometimes np.dot(W, x).
Take this image for example. They represent the input data as a matrix of [NxD] and weight data as [DxH] where H is the number of neurons in the hidden layer. This seems the most natural since input data will often be in tabular format with rows as samples and columns as features.
Now an example from the CS231n course notes. They talk about this below example and cite the code used to compute it as:
f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)
x = np.random.randn(3, 1) # random input vector of three numbers (3x1)
h1 = f(np.dot(W1, x) + b1) # calculate first hidden layer activations (4x1)
h2 = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)
Where W is [4x3] and x is [3x1]. I would expect the weight matrix to have dimensions equal to [n_features, n_hidden_neurons] but in this example it just seems like they transposed it naturally before it was used.
I guess I am just confused about general nomenclature in how data should be shaped and used consistently when computing neural network forward passes. Sometimes I see transpose, sometimes I don't. Is there a standard, preferred way to represent data in accordance to a diagram like these This question may be silly but I just wanted to discuss it a bit. Thank you.
TLDR;
NumPy handles the computation for dot product of 2 arrays without worrying too much about the order in which they are added to parameters IF one or both of them are 1-D arrays. In case of 2-D arrays, the dot product requires them to share the last and first axis respectively.
Detailed explanation;
From a mathematical point of view, during dot product (nxm) and (mx1) should share a common dimension so that the resultant array is of the form (nx1). However, from an implementation perspective, numpy handles the shapes of inputs if one or both of them are 1-D arrays. As per official documentation -
If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication,
but using :func:matmul or a # b is preferred.
If either a or b is 0-D (scalar), it is equivalent to :func:multiply
and using numpy.multiply(a, b) or a * b is preferred.
If a is an N-D array and b is a 1-D array, it is a sum-product over the last axis of a and b.
If a is an N-D array and b is an M-D array (where M>=2), it is a
sum-product over the last axis of a and the second-to-last axis of b
Therefore, consider the toy example -
x = np.array([4,5,6]) #shape - (3,1)
W = np.array([[1,0,0],[0,1,0],[0,0,1]]) #shape - (3,3)
np.dot(x, W)
#Output - array([4, 5, 6])
np.dot(W.T, x)
#Output - array([4, 5, 6])
np.dot(W, x)
#Output - array([4, 5, 6])
All of the above have the same result computationally because the last axis of both W and x is 3 , which is shared while the other axes result in the dot product which can be either (1,3) or (3,1) which NumPy simplifies to a (3,) changing it to a 1-D vector instead of a 2-D matrix with a single row.
If a is an N-D array and b is a 1-D array, it is a sum-product over the last axis of a and b.
This doesn't work as easily when we are taking dot product of 2-D arrays. This is where you have to make sure that the first array shares its last axis with the first axis of the second array. example -
W2 = np.array([[1,0,0],[0,1,0]]) #shape - (2,3)
np.dot(W, W2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-913-839ef1feb8c0> in <module>
----> 1 np.dot(W, W2)
ValueError: shapes (3,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
I am trying to implement a multivariate Gaussian Mixture Model and am trying to calculate the probability distribution function using tensors. There are n data points, k clusters, and d dimensions. So far, I have two tensors. One is a (n,k,d) tensor of centered data points and the other is a kxdxd tensor of covariance matricies. I can compute an nxk matrix of probabilities by doing
centered = np.repeat(points[:,np.newaxis,:],K,axis=1) - mu[np.newaxis,:] # KxNxD
prob = np.zeros(n,k)
constant = 1/2/np.pow(np.pi, d/2)
for n in range(centered.shape[1]):
for k in range(centered.shape[0]):
p = centered[n,k,:][np.newaxis] # 1xN
power = -1/2*(p # np.linalg.inv(sigma[k,:,:]) # p.T)
prob[n,k] = constant * np.linalg.det(sigma[k,:,:]) * np.exp(power)
where sigma is the triangularized kxdxd matrix of covariances and centered are mypoints. What is a more pythonic way of doing this using numpy's tensor capabilites?
Just a couple of quick observations:
I don't see you using p in the loop; is this a mistake? Using n instead?
The T in centered[n,k,:].T does nothing; with that index the array is 1d
I'm not sure if np.linal.inv can handle batches of arrays, allowing np.linalg.inv(sigma).
# allows batches, just so long as the last 2 dim are the ones entering into the dot (with the usual last of A, 2nd to the last of B rule; einsum can also be used.
again does np.linalg.det handle batches?
Here is my python program:
import numpy as np
from sklearn import linear_model
X=np.array([[1, 2, 4]]).T**2
y=np.array([1, 4, 16])
model=linear_model.LinearRegression()
model.fit(X,y)
print('Coefficients: \n', model.coef_)
As a result i have:
Coefficients:
[1.]
It is a first program i test with sklearn.
My question is: why i have to use the transpose .T**2 in the third instruction ?
Without
T**2
i have these errors https://imgur.com/a/XWzJx0f
i use http://jupyter.org/try
As the documentation says, you have to pass a matrix with n_samples (3) and n_features (1). So your input X in the form [[1,2,3]] needs the inner vector in a vertical position.
After **T:
array([[ 1],
[ 4],
[16]])
This is what happens under the hood: https://machinelearningmastery.com/solve-linear-regression-using-linear-algebra/
You have to match X,y in same dimensions (same number of training samples)
If you do not use transpose, you have 1 training sample [1,2,4] but 3 labels, which does not match
If you use transpose, you could have [1][2][4] 3 samples and thus could match 3 labels
the **2 does not matters
The initial shape of matrix X in (1,3). You need to pass the matrix in form of (3,1) as the documentation says and mentioned in answer by Alessandro
The **2 part is just squaring each of the element of matrix X. You can run it without that part. The coefficient will differ then. Currently, when you squared, you have each of the X and y values as (1,1), (4,4) and (16,16) so the coefficient (slope of equation y=mx+ c, if you plot these on graph) is 1. If you don't square, coefficient will differ accordingly
I am trying to implement a way to cluster points in a test dataset based on their similarity to a sample dataset, using Euclidean distance. The test dataset has 500 points, each point is a N dimensional vector (N=1024). The training dataset has around 10000 points and each point is also a 1024- dim vector. The goal is to find the L2-distance between each test point and all the sample points to find the closest sample (without using any python distance functions). Since the test array and training array have different sizes, I tried using broadcasting:
import numpy as np
dist = np.sqrt(np.sum( (test[:,np.newaxis] - train)**2, axis=2))
where test is an array of shape (500,1024) and train is an array of shape (10000,1024). I am getting a MemoryError. However, the same code works for smaller arrays. For example:
test= np.array([[1,2],[3,4]])
train=np.array([[1,0],[0,1],[1,1]])
Is there a more memory efficient way to do the above computation without loops? Based on the posts online, we can implement L2- norm using matrix multiplication sqrt(X * X-2*X * Y+Y * Y). So I tried the following:
x2 = np.dot(test, test.T)
y2 = np.dot(train,train.T)
xy = 2* np.dot(test,train.T)
dist = np.sqrt(x2 - xy + y2)
Since the matrices have different shapes, when I tried to broadcast, there is a dimension mismatch and I am not sure what is the right way to broadcast (dont have much experience with Python broadcasting). I would like to know what is the right way to implement the L2 distance computation as a matrix multiplication in Python, where the matrices have different shapes. The resultant distance matrix should have dist[i,j] = Euclidean distance between test point i and sample point j.
thanks
Here is broadcasting with shapes of the intermediates made explicit:
m = x.shape[0] # x has shape (m, d)
n = y.shape[0] # y has shape (n, d)
x2 = np.sum(x**2, axis=1).reshape((m, 1))
y2 = np.sum(y**2, axis=1).reshape((1, n))
xy = x.dot(y.T) # shape is (m, n)
dists = np.sqrt(x2 + y2 - 2*xy) # shape is (m, n)
The documentation on broadcasting has some pretty good examples.
I think what you are asking for already exists in scipy in the form of the cdist function.
from scipy.spatial.distance import cdist
res = cdist(test, train, metric='euclidean')
Simplified and working version from this answer:
x, y = test, train
x2 = np.sum(x**2, axis=1, keepdims=True)
y2 = np.sum(y**2, axis=1)
xy = np.dot(x, y.T)
dist = np.sqrt(x2 - 2*xy + y2)
So the approach you have in mind is correct, but you need to be careful how you apply it.
To make your life easier, consider using the tested and proven functions from scipy or scikit-learn.