Im have N pairs of portfolio weights stored in a numpy array and would like to calculate portfolio risk which is w * E * w_T where w_T is weight transpose. The way I came up with is to loop through each weight pair and apply the matrix multiplication. Is there a vectorized approach to this such that given a weight pair (or if possible N weights that all sum to 1) I apply a single covariance matrix to each row to get the risk (ie without loop)?
import numpy as np
w = np.array([[0.2,0.8],[0.5,0.5]])
covar = np.array([0.000046,0.000017,0.000017,0.000032]).reshape([2,2])
w1 = w[0].reshape([1,2]) # each row in w
#portfolio risk
np.dot(np.dot(w1,covar),w1.T)
#Adam's answer is valid, but for big arrays, can result with very big temporary arrays (NxN), and unnecessary computations (computing the off-diagonal elements).
Here's a similar, yet much more efficient solution:
(I added another weight-pair, to distinguish between the different dimensions of the problem)
w = np.array([[0.2,0.8],[0.5,0.5], [0.33, 0.67]])
covar = np.array([0.000046,0.000017,0.000017,0.000032]).reshape([2,2])
(np.dot(w, covar) * w).sum(axis=-1)
=> array([ 2.77600000e-05, 2.80000000e-05, 2.68916000e-05])
By using plain-multiplication in the second step, I'm avoiding the unnecessary compuations of the off-diagonals.
EDIT: explaining the temporary arrays
# first multiplication (in both solutions)
np.dot(w, covar).shape
(3, 2)
# second, my solution
(np.dot(w, covar) * w).shape
(3, 2)
# second, Adam's solution
np.dot(np.dot(w,covar),w.T).shape
(3, 3)
Now, if you have N sets of weights you want to compute risk for (in this example N=3), and M instruments in your portfolio (here M=2), and N>>M, you get an array which is much bigger with Adam's solution (NxN). Not only that it will consume more memory, the computation populating the off-diagonal elements are expensive (matrix multiplication), and unnecessary.
It seems like your code is already set up for a vectorized approach, but you are only dealing with one row at a time. Grabbing the diagonals from the result when using your full weight matrix should give you what you want.
# portfolio risk
np.diagonal(np.dot(np.dot(w,covar),w.T))
Related
I have a matrix of 63695 row vectors of dim 384.
I would like to compute the cosine similarity model for this matrix.
I was thinking of vectorizing it.
How would one proceed to that objective?
If you look in scikit-learns source code you will see that X and Y are first normalized and then X_norm # Y_norm.T (dot product) is returned. Or if as in your case no Y exists it is X_norm # X_norm.T.
Normalizing and transposing can be discarded when looking at the runtime, but the matrix multiplaction of a (63695 x 384) matrix should take somewhere in the neighbourhood of 63695*63695 (elements in result matrix) times 384*384 (element-wise multiplactions and additions to calculate one element) calculations, so something like 63695*63695*384*384 = 598,236,810,854,400 operations. (Or strictly, that number of multiplications plus that same number of additions.)
And as you already mentioned it requires 4 (Bytes for float32) * 63695 * 63695 = ~16.2 GB of memory to handle that result matrix.
Do you really need that enormous matrix? What type of data are you handling and what are you trying to do? If we are talking about e.g. vector represenations of text data then you should look at removing duplicates, processing it in chunks or reducing the dimensionality before analysing similarity. If you are looking for something like ranking these cosine similarities and finding then k most similar ones you'd be much better of using algorithms for finding similar data points instead of doing it all by hand yourself.
Let w, x, y, z be torch tensors of shape (m, n) and we wish to compute the following unbiased estimator row-wise efficiently (without for loops), where I want to compute for every row 1, ..., m:
In case of only the unbiased estimator of the square of means, i.e., for :
this is possible, e.g., using torch.einsum:
batch_outer = torch.einsum('bi, bj -> bij', x, y)
zero_diag = 1-torch.eye(batch_outer.shape[1])
return (batch_outer * zero_diag).sum(dim=2).sum(dim=1) / (n * (n-1))
However, for the case to the power of four this is not so easy doable, mostly because these are not squared tensors and in particular, because the zeroing out of the diagonals becomes very tedious.
My questions:
1.) How can this be implemented efficiently ommitting any for loops?
2.) Which time and memory complexity would that solution have in big O notation?
3.) Can this solution also be used to do it with four 3D tensors of shape (m, k, n), where again we only want to do the computations along the axes of length n (dim=2)?
4.) If I want to do it in log-space for numerical stability, i.e., to use logsumexp for summations and sums for multiplications (because log(xy)= log(x)+log(y)), any solution with einsum wouldnt work anymore. How could that computation then be done in log space?
1 This implementation seems to work if I didn't make mess with the diagonal dimensions.
import numpy as np
import torch as th
x = np.array([1,4,5,3])
y = np.array([5,2,4,5])[np.newaxis]
z = np.array([5,7,4,5])[np.newaxis][np.newaxis]
w = np.array([3,9,5,1])[np.newaxis][np.newaxis][np.newaxis]
xth = th.Tensor(x)
yth = th.Tensor(y)
zth = th.Tensor(z)
wth = th.Tensor(w)
tensor = xth*th.transpose(yth, 0, 1)*th.transpose(zth,0,2)*th.transpose(wth,0,3)
diag = th.diagonal(tensor, dim1 = -2, dim2 = -1)
result = th.sum(tensor) - th.sum(diag)
result /= np.math.factorial(len(x))
print(result)
The order is between O(n^2.37..) - O(n^3), depending on the pytorch implementation of the matrix multiplication.
I don't see why not, just choose properly the dimensions to transpose and take the diagonal.
I don't see why would this solution won't work in a log-space.
pd: my knowledge in pytorch is quite limited, but I'm sure you can define x,y,z,w in a more elegant way.
I'm new to numpy, and found such strange(as for me) behavior.
I'm implementing logistic regression cost function, here I have 2 column vectors with same dimension and same types(dfloat). y contains bunch of zeros and ones, and a contains float numbers in range (-1, 1).
At some point I should get dot product so I transpose one and multiply them:
x = y.T # a
But when I use
x = y # a.T
occasionally performance decreases about 3 times, while results are the same
Why is this so? Isn't operations are the same?
Thanks.
The performance decreases, and you get a very different answer!
For vector multiplication (unlike number multiplication) a # b != b # a. In your case (assuming column vectors), a.T # b is a number, but a # b.T is a full-blown matrix! So, if your vectors are both of shape (1, y), the last operation will result in a (y, y) matrix, which may be pretty huge. Of course, it'll take way more time to compute such a matrix (a.k.a. add a whole lot of numbers and produce a whole lot of numbers), than to add a bunch of numbers and produce one single number.
That's how matrix (or vector) multiplication works.
In Tensorflow (python), given a matrix X of shape (n x d), where each row is a data point, I would like to compute the pairwise inner products of these n data points, i.e., the upper triangle of XX'.
Of course I could compute the whole XX' and fetch its upper triangle, but this means I would compute the off-diagonal elements twice. How to compute these efficiently in Tensorflow (python) by computing the inner product only once per pair?
With numpy, you can do this:
import numpy as np
A = np.random.randn(5, 3)
inds = np.triu_indices(5) # upper triangle indices
# expensive way to do it
ipu1 = np.dot(A, A.T)[inds]
# possibly less expensive way to do it.
ipu2 = np.einsum('ij,ij->i', A[inds[0]], A[inds[1]])
print(np.allclose(ipu1, ipu2))
This outputs True. Tensorflow does not have the triu_indices function build in, but it is not hard to write one if needed by looking at the numpy code. It does have einsum.
As the question suggests, I would like to compute the gradient with respect to a matrix row. In code:
import numpy.random as rng
import theano.tensor as T
from theano import function
t_x = T.matrix('X')
t_w = T.matrix('W')
t_y = T.dot(t_x, t_w.T)
t_g = T.grad(t_y[0,0], t_x[0]) # my wish, but DisconnectedInputError
t_g = T.grad(t_y[0,0], t_x) # no problems, but a lot of unnecessary zeros
f = function([t_x, t_w], [t_y, t_g])
y,g = f(rng.randn(2,5), rng.randn(7,5))
As the comments indicate, the code works without any problems when I compute the gradient with respect to the entire matrix. In this case the gradient is correctly computed, but the problem is that the result has only non-zero entries in row 0 (because other rows of x obviously do not appear in the equations for the first row of y).
I have found this question, suggesting to store all rows of the matrix in separate variables and build graphs from these variables. In my setting though, I have no idea how much rows might be in X.
Would anybody have an idea how to get the gradient with respect to a single row of a matrix or how I could omit the extra zeros in the output? If anybody would have suggestions how an arbitrary amount of vectors can be stacked, that should work as well, I guess.
I realised that it is possible to get rid of the zeros when computing derivatives with respect to the entries in row i:
t_g = T.grad(t_y[i,0], t_x)[i]
and for computing the Jacobian, I found out that
t_g = T.jacobian(t_y[i], t_x)[:,i]
does the trick. However it seems to have a rather heavy impact on computation speed.
It would also be possible to approach this problem mathematically. The Jacobian of the matrix multiplication t_y w.r.t. t_x is simply the transpose of t_w.T, which is t_w in this case (the transpose of the transpose is the original matrix). Thus, the computation would be as simple as
t_g = t_w